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REAL PARTY IN INTEREST 

The real party in interest in this appeal is IVIax-Delbruck-Centrum fur Molekulare 
Medizin, the assignee of record and having its principal office at Robert-Roessle- 
Strasse 10, 13125 Berlin, Germany. 

The assignee acquired its interest by virtue of assignment and recorded by the 
assignment branch of the United States Patent and Trademarl< Office at reel 01 6624, 
frame 0483. 

Nonwithstanding the assignment, the appellants, Karen Uhlmann, Peter 
Nurnberg, and Anja Brinckmann remain a party of interest because of the German law 
(Gesetz uber Arbeitnehmererfindungen) covering employed inventors. 



RELATED APPEALS AND INTERFERENCES 

Appellants' attorney is unaware of any other appeals or interferences which will 
directly affect or be directly affected by or have a bearing on the Board's decision in the 

pending appeal. 



STATUS OF CLAIMS 

Rejected Claims: 1 to 5, 7 to 20 and 22 to 39 

Allowed Claims: None 

Withdrawn Claims: None 

Claims Objected to: None 

Claims Cancelled: 6 and 21 

The appealed claims are: 1 to 5, 7 to 20 and 22 to 39 



Claims 1 to 5, 7 to 20 and 22 to 39 are pending in the application. Claims 6 and 
21 were cancelled. Claims 1 to 5, 7 to 20 and 22 to 39 stand twice rejected. Since, as 
of the date of this appeal, none of pending claims 1 to 5, 7 to 20 and 22 to 39 have been 
allowed, this appeal brief follows. 
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STATUS OF AMENDMENTS 

Claims 1 to 5, 7 to 20 and 22 to 39 were rejected in the final action mailed on 
January 23, 2009. No subsequent amendments were filed. 
SUMMARY OF CLAIMED SUBJECT MATTER 

The claims cover a method for detecting the methylation status of a nucleotide of 
a nucleic acid molecule. The method comprises treating a sample comprising the 
nucleic acid molecule with an agent that converts said nucleotide when it is methylated 
form or non-methylated form so that it pairs with a nucleotide with which it would 
normally not pair with. The so treated nucleic acid molecule is then amplified with an 
amplification primer that is detectably labelled with a detectable label that forms an 
anchor for removal of the single stranded nucleic acid molecules. The single stranded 
nucleic acid molecule so generated is then real-time sequenced and the methylation 
status of the nucleotide in the sample is detected or determined. 
The method combines the treatment of the nucleic acid molecules to create new pairing 
partners upon subsequent amplification, amplification and real-time sequence to provide 
a highly efficient method to detect, and optionally quantify, the methylation status of 
nucleotides in a nucleic acid molecule that is amenable to high throughput analysis, 
e.g., on microtiter plates which allows, e.g., 96 different gene loci may be screened. 
Pathological condition or the predisposition for said pathological condition may be 
diagnosed using the method. 

Claim 1 

The appellant's invention in independent claim 1 is directed to a method for 
detecting the methylation status of a nucleotide at a predetermined position in a nucleic 
acid molecule (page 4, lines 19 to 20; page 8, lines 17 to 20). The methylation status is 
detected in a sample comprising the nucleic acid molecule (page 4, lines 21 to 22; page 

5, lines 23 to 28). The sample is treated in an aqueous solution (page 4, line 22; page 

6, lines 20 to 23) with an agent suitable for the conversion of said nucleotide if present 
in (i) methylated form; or (ii) non-methylated form (page 4, lines 22 to 24, page 6, line 28 
to page 7, line 2) to pair with a nucleotide normally not pairing with said nucleotide prior 
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to conversion (page 4, lines 23 to 24; page 7, lines 2 to 12). The so treated nucleic acid 
molecule is amplified via at least one amplification primer (page 4, lines 24 to 25; page 
13, lines 3 to 6) to produce an amplification product (page 13, lines 6 to 7) and the 
amplification product is converted into single stranded amplified nucleic acid molecules 
(page 13, lines 6 to 9, Figure 1 A, Figure 1 B, left hand side). The at least one 
amplification primer is detectably labeled with a detectable label that forms an anchor 
for removal of said single stranded amplified nucleic acid molecules (page 13, lines 3 to 
20, Figure 1 B, left hand side) to generate a single stranded amplified nucleic acid 
molecule (page 13, lines 4 to 9). The single stranded amplified nucleic acid molecule is 
real-time sequenced (page 4, line 25; page 13, lines 23 to 30) and it is detected whether 
said nucleotide is methylated or not methylated at said predetermined position in the 
sample (page 4, 26 to 27; page 14, lines 1 to 5). 

Claim 12 

Independent claim 12 describes a method for diagnosing a pathological condition 
or the predisposition for a pathological condition by determining the methylation status 
of a nucleotide at a predetermined position in the nucleic acid molecule (page 17, line 
24 to 27; and in Figures 5 and 6; page 8, lines 17 to 20). The methylation status is 
detected in a sample comprising the nucleic acid molecule (page 17, line 27 to 28; page 

5, lines 23 to 28). The sample is treated in an aqueous solution (page 17, line 28; page 

6, lines 20 to 23) with an agent suitable for the conversion of said nucleotide if present 
in (i) methylated form; or (ii) non-methylated form (page 17, lines 29 to 31 ; page 6, line 
28 to page 7, line 2) to pair with a nucleotide normally not pairing with said nucleotide 
prior to conversion (page 1 7, lines 30 to 31 ; page 7, lines 2 to 1 2). The so treated 
nucleic acid molecule is amplified via at least one amplification primer (page 13, lines 3 
to 6) to produce an amplification product (page 13, lines 6 to 9) and the amplification 
product is converted into single stranded amplified nucleic acid molecules (page 13, 
lines 6 to 9, Figure A, Figure 1 B, left hand side). The at least one amplification primer is 
detectably labeled with a detectable label that forms an anchor for removal of said 
single stranded amplified nucleic acid molecules (page 13, lines 3 to 20, Figure 1 B, left 
hand side) to generate a single stranded amplified nucleic acid molecule (page 13, lines 
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5 to 9). The single stranded amplified nucleic acid molecule is real-time sequenced 
(page 17, line 32; page 13, lines 23 to 30) and it is detected whether said nucleotide is 
methylated or not methylated at said predetermined position in the sample (page 17, 
line 33 to page 18, line 3) to diagnose said pathological condition or the predisposition 
for said pathological condition (page 17, lines 24 to 27). 

Claim 32 

Independent claim 32 describes a method for generating new nucleotide pairing 
partners upon amplification of at least one nucleic acid molecule for the detection of the 
methylation status in a nucleotide sample (page 8, lines 17 to 20). 

The methylation status is detected by providing at least one nucleic acid 
molecule and treating a nucleic acid molecule with an agent suitable for the conversion 
of said nucleotide if present in methylated form or non-methylated form (page 6, lines 28 
to 32) to pair with a nucleotide pairing partners normally not pairing with said nucleotide 
prior to conversion (page 7, lines 2 to 12). The so treated nucleic acid molecule is 
amplified via at least one amplification primer (page 1 1 , line 1 3 to page 1 3, line 6) to 
produce an amplification product (page 13, lines 6 to 7) and the amplification product is 
converted into single stranded amplified nucleic acid molecules (page 13, lines 6 to 9, 
Figure 1 A, Figure 1 B, left hand side). The at least one amplification primer is detectably 
labeled with a detectable label that forms an anchor for removal of said single stranded 
amplified nucleic acid molecules (page 1 3, lines 3 to 20, Figure 1 B, left hand side) to 
generate a single stranded amplified nucleic acid molecule (page 13, lines 4 to 9) 
comprising said new nucleic acid pairing partners normally not pairing with said 
nucleotide prior to conversion. The single stranded amplified nucleic acid molecule is 
real-time sequenced (page 8, line 19; page 13, lines 23 to 30) and determining the 
amount of said nucleotide pairing with said new nucleotide pairing partners to detect the 
methylation status of nucleotides of said nucleic acid (page 8, line 10 to 15; page 9, 
lines 2 to 6). 

Claim 10 

Claim 10 is dependent from claim 1 (described above) and the method further 
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comprises calculating a frequency of methylated nucleotides from results of said real- 
time sequencing (page 3, lines 4 to 5; page 3, lines 10 to 17; Fig. 2, top; Fig. 3, black 
circles; Fig. 4; page 23, lines 23 to 27). 

Claim 25 

Claim 25 is dependent from claim 1 2 (described above) and the method further 
comprises calculating a frequency of methylated nucleotides from results of said real- 
time sequencing (page 3, lines 4 to 5; page 3, lines 10 to 17; Fig. 2, top; Fig. 3, black 
circles; Fig. 4; page 23, lines 23 to 27). 

Claim 34 

Dependent claim 34 is dependent from claim 1 0. Claim 1 0 is dependent from 
claim 1 (described above) and the method further comprises calculating a frequency of 
methylated nucleotides from results of said real-time sequencing (page 3, lines 4 to 5; 
page 3, lines 10 to 17; Fig. 2, top; Fig. 3, black circles; Fig. 4; page 23, lines 23 to 27). 
Claim 34 further comprises detecting an allele frequency, where an allele frequency of 
5% can be detected (Fig. 2 and page 24, lines 7 to 14). 

Claim 39 

Dependent claim 39 is also dependent from claim 1 0. Claim 1 0 is dependent 
from claim 1 (described above) and the method further comprises calculating a 
frequency of methylated nucleotides from results of said real-time sequencing (page 3, 
lines 4 to 5; page 3, lines 10 to 17; Fig. 2, top; Fig. 3, black circles; Fig. 4; page 23, lines 
23 to 27). Claim 39 further comprises detecting an allele frequency of 5% with a 
standard deviation of not more than 1% (Fig. 2 and page 24, lines 7 to 14). 
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GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

The grounds of rejection to be reviewed on appeal are as follows. 

Whether claims 1 -5, 7-9, 11-12,1 9-20, 22-24, 26-33 and 36 are obvious under 
35 use §1 03(a) over Uhlmann et al, CHANGES IN METHYLATION PATTERNS 
IDENTIFIED BY TWO-DIMENSIONAL DNA FINGERPRINTING, Electrophoresis 20: 
1 748-55 (1 999). (hereinafter "Uhlmann '99") in view of U.S. Patent No. 6,258,568 
Nyren et al. (hereinafter "Nyren"). 

Whether claims 1 2-1 6, 1 8 and 38 are obvious under 35 USC §1 03(a) over 
Uhlmann '99 in view of Nyren, and further in view of US Patent No. 5,786,146 to 
Herman (hereinafter "Herman"). 

Whether claim 17 is obvious under 35 USC §1 03(a) over Uhlmann '99 in view of 
Nyren, and Herman as applied to claims 12 and 38, in further view of US Patent 

Publication No. 2003/0232351 to Feinberg (hereinafter "Feinberg"). 

Whether claims 10, 25, 34 and 39 are obvious under 35 USC §1 03(a) over 
Uhlmann '99 in view of Nyren as applied to claims 1 and 12, and further in view of US 
Patent 7,078,168 to Sylvan (hereinafter "Sylvan"). 

Whether claim 35 is obvious under 35 USC §1 03(a) over Uhlmann '99 in view of 

Nyren as applied to claim 1 and further in view of US Patent Publication No. 
2002/0086324 to Laird (hereinafter "Laird"). 

Whether claim 37 is obvious under 35 USC §1 03(a) over Uhlmann'99 in view of 
Nyren as applied to claim 1 and 8 and further in view of US Patent 5,602,000 to Hyman 
(hereinafter "Hyman"). 
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ARGUMENT 



A. THE COMBINATION OF UHLMANN '99 AND NYREN DOES NOT RENDER 
CLAIMS 1-5, 7-9, 11-12, 19-20, 22-24, 26-33, 36 AND 37 OBVIOUS 

Claims 1-5, 7-9, 11-12, 19-20, 22-24, 26-33 AND 36 

Claim 1 is an independent method claim. Claims 2 to 5, 7 to 9, (10), 1 1 , 27, 30, 31 , 
33 (34), (35), 36, 37 and (39) are directly or indirectly dependent on claim 1 . 

Claim 12 is an independent method claim. Claims (13-18), 19, 20, 22-24, 26, 28, 
29 and (38) are directly or indirectly dependent on claim 12. 

Claim 32 is an independent method claim. 

The numbers in parenthesis () indicate claims not part of the specified rejection. 

The following will show that independent claims 1 , 12 ad 32 patentably 
distinguish the appellant's invention over the combination of Uhlmann and Nyren. 

Uhlmann '99 observed differences in 2-D DNA fingerprinting patters in genomic 
tumor DNA relative to the blood DNA (non-tumor) of certain tumor patients. In one 
experiment, the authors observed that a "spot" of a particular tumor/non-tumor DNA 
fragment pair on a 2-D filter was of the same size, but showed different melting 
behavior. The authors hypothesized that this was the result of different methylation 
patters at relevant regions of the DNA fragment of interest (Uhlmann '99, page 1748, 
right col., first full paragraph). 

Uhlmann '99 notes that methylation is recognized as an important factor in tumor 
development as it influences not only the expression of single genes, but also 
conformation of the DNA and the activity status of a whole chromosome (Uhlmann '99, 
page 1749, left col., first paragraph). 

After seeing fingerprinting patters in methylated and non-methylated test DNAs (phage 
lamba DNA) similar to that of the DNA of interest, the authors set out to test their 
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hypothesis, namely whether the differences in the melting characteristics of the spots 
they had observed were indeed the result of differential methylation. 
The authors of Uhlmann '99 decided to determine the methylation status of the DNA 
fragments of interest using the so-called "bisulfite approach." This technique ("bisulfite 
-treatment") is based on sodium bisulfite-mediated conversion of non-methylated 
cytosines to uracil and thus allows the identification of 5-methylcytosine (which is not 
converted) in genomic (tumor/non-tumor) DNA (Uhlmann '99, page 1749, left coL, first 
full paragraph). 

The data presented in Fig. 4 compares corresponding blood and tumor sample 
pairs and supports that the differences observed in the 2-D fingerprint spots correlate 
with differential methylation (Uhlmann '99, Fig. 4, page 1752, right column). 

Thus, Uhlmann '99 sought out and unveiled that the differences observed in 2-D- 
fingerprints of bloods and tumor DNA sample pairs could indeed be correlated to 
differential methylation status of these samples. The authors conclude that 2-D 
fingerprinting can thus be used to distinguish between methylated and non-methylated 
DNA of the DNA in question (a DNA fragment which is, in tumor DNA demethylated in 
the melting domain resulting in the differential 2-D fingerprint patterns: Uhlmann '99, 
Fig. 2, page 1750, right column). 

The Examiner concentrated in her rejection on the specific way with which the 
authors of Uhlmann '99 test their hypothesis that the differences in the 2-fingerprinting 
spots they observed are indicative of differential methylation of their tumor and blood 
(non-tumor) sample. 

In particular, Uhlmann '99 proceeds as follows: 
- Genomic (blood and tumor) DNA are, after cutting the DNA with restriction enzymes, 
immobilized in agarose beads to fix the DNA in single stranded form (Uhlmann '99, Fig 
1 ., page 1750, left col., first step). As can be inferred by Figure 4 (Uhlmann '99, Fig. 4, 
page 1752, right column), blood and tumor DNA are analyzed separately and thus are 
treated in separate beads. 
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- The respective DNA is subjected to bisulphite treatment (Uhlmann '99, Fig 1 page 

1750, left cel., second step). 

- After termination of the reaction and washing of the beads the treated DNA of interest, 
still contained in the beads, is amplified by PGR in separate reactions for the sense and 
antisense strands of the DNA (see Fig. 1 and description on page 1751 , left column). 

- The respective amplification products (for sense and antisense strand) are gel extracted 
and cloned using a Topo TA cloning Kit (INVITROGEN, NL) to produce single stranded 
DNA. 

The cloned single stranded amplification products are then sequenced by the 
dideoxynucleotide chain-termination method (see para. 2.5, p. 1750, right column to 

1751, right col., I. 3 as well as Fig.1). 

Uhlmann '99 is cited for teaching "a method for identifying methylated cytosines 
comprising treating a sample containing genomic DNA derived from blood and tumor 
tissue with sodium bisulfite and amplifying the sample by PGR" (Office Action 1/23/09, 
page 4, lines 4 to 7). The Examiner acknowledged that Uhlmann '99 teaches that the 
amplified nucleic acids were then cloned and plasmid DNA of the clones was prepared 
and sequenced using the dideoxynucleotide chain termination method to determine the 
methylation state of the amplified product. 

The Examiner further acknowledged: 
that Uhlmann '99 does not teach a method wherein the amplification primer has a label 
that forms an anchor for removal of single stranded amplified nucleic acid molecules; 
that Uhlmann '99 does not teach a method wherein said amplification primer is labeled 
with a biotin. 

that Uhlmann '99 does not teach that the amplified nucleic acids were sequenced using 
a real-time sequencing method that comprises hyvbridizing a sequencing primer to a 
single stranded nucleic acid, adding DNA polymerase and other components and 
detecting a luminescence signal. 

- Finally, the Examiner also acknowledged that Uhlmann '99 does not teach a 
sequencing method that is a high throughput method (Office Action 1/23/09, page 4, last 
three lines to page 5, line 6) 
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However, the Examiner argued that Nyren teaches "an alternative method for 
sequencing. In the method of Nyren, PGR is performed using one or more primers that 
carry a functional group such as biotin which permits subsequent immobization and aids 
in the separation of a single stranded DNA (col. 8, lines 1 to 5). Thus, Nyren is said to 
teach a method wherein the amplification primer had a label that forms an anchor for 
removal of single stranded amplified nucleic acid molecules" (Office Action 1/23/09, 
page 5, line 7-8). Nyren is further said to teach real time sequencing. Nyren names 
many examples in which his method would provide benefits for the user (e.g. Nyren, 
paragraph bridging col. 13 and 14). None of these examples include a pretreatment of 
the sample DNA, in particular not with an agent that modifies the DNA and certainly not 
a pretreatment that would allow one to detect methylation changes in the DNA as a 
result of the fact that the DNA was changed in a way that the methylated/non- 
methylated nucleotide show up as different bases in subsequent sequencing. 
Nonetheless, the Examiner concluded that it would have been obvious to have modified 
"the method of Uhlmann by using the sequencing method of Nyren which includes 
performing PGR with at least one amplification primer labeled with biotin and then 
sequencing the single stranded nucleic acid via pyrosequencing" (OA 1/23/09; page 6, 
beginning of first full paragraph). 

The Examiner cited a number of advantages that Nyren describes with respect to his 
method which include high throughput sequencing, an automated approach for large 
scale sequencing, handling multiple samples in parallel. The Examiner expressed the 
opinion that the claimed method is obvious "because the substitution of PGR, cloning 
and sequencing steps performed by Uhlmann for the PGR and sequencing steps 
performed by Nyren would have been yielded predictable results to one of ordinary still 
in the art." (OA 1/23/09; page 6, line 22 to page 7, line 2). 

Appellants will first argue the specifics of the rejections made and then will discuss the 
combination of the two references in more general terms. 
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CLAIMED ELEMENTS NOT ACCOUNTED FOR IN THE OBVIOUSNESS ANALYSIS 

The Examiner based her rejection on the rationale that the combination of elements of 
Uhlmann '99 with Nyren according to known methods yields predictable results (MPEP 

§2143). 

To support this rationale, Office personnel must resolve the Graham factual inquiries. 

Office personnel must, among others, articulate the following: 

(1) a finding that the prior art included each element claimed , although not 

necessarily in a single prior art reference, with the only difference between the 
claimed invention and the prior art being the lack of actual combination of the 
elements in a single prior art reference, {emphasis added; MPEP §2143 A. 1 .) 

Claim 32 

The invention as claimed in claim 32 requires "determining the amount of said 
nucleotide pairing with said new nucleotide pairing partners." 

The Examiner has not provided any showing for the element "determining the amount of 
said nucleotide pairing with said new nucleotide pairing partners" as set forth in the 

claim and thus has not provided a complete analysis in accordance with MPEP §2143 
A. 1. 

Claims 1 and 12 

The invention as claimed in claims 1 and 1 2 and all claims dependent thereon requires 
that treatment of the sample, e.g., with bisulfite, takes place in "an aqueous solution ." 
The specification clarifies that an "aqueous solution" may be water such as distilled 
water, a buffered solution such as a phosphate buffered solution or buffered solution 
other than a phosphate buffered solution, to name some important examples (page 6, 
lines 6 to 9; page 6, lines 20 to 23; [0016] of the application as published). 
The bisulfite treatment in Uhlmann '99 takes place in agarose beads (Uhlmann '99, Fig. 
1 , page 1 750, left column), which facilitates the modification procedure (see description 
for Fig. 1 , page 1750, left column). The DNA remains in Uhlmann '99s beads until after 
the PGR (amplification step) takes place. 
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The Examiner has not provided any showing for the element "aqueous solution" as set 
forth in the claims and thus has not provided a complete analysis MPEP §2143 A, 1 . 

THE SUBSTITUTION OF PGR, CLONING AND SEQUENCING STEPS PERFORMED 
BY UHLMANN '99 FOR THE PGR AND SEQUENCING STEPS PERFORMED BY 
NYREN WOULD NOT HAVE YIELDED PREDICTABLE RESULTS 

The Examiner presently argues for the substitution of the PGR, cloning and sequencing 
steps performed by Uhlmann '99 by the PGR and sequencing steps performed by 
Nyren. 

The substitution of the PGR, cloning and sequencing steps performed by Uhlmann '99 
by the PGR and sequencing steps performed by Nyren, thus leaves Uhlmann '99's DNA 
that is to be subjected to Nyren's PGR and sequencing in the agarose beads (1 .7% low 
melting agarose- in which Uhlmann '99's bisulfite treatment took place and which has a 
consistency that allows to fix single stranded DNA Uhlmann '99, page 1750, right 
column, first full paragraph). This might raise the question as to whether the relative 
complex sequencing reaction of Nyren that follows his PGR could be performed in such 
an environment. More importantly, the Examiner did not make clear why predictable 
results should be expected by employing a PGR using detectably labeled amplification 
primers for subsequent real time sequencing of a DNA sample that is contained in 
agarose beads. From the teachings of Uhlmann '99, which gel extracts the DNA after 
PGR, the person skilled in the art would be under the impression that the PGR 
(amplification) product would need to be gel extracted for further processing, in 
particular sequencing (Uhlmann '99, page 1750, left column, first paragraph). The 
Examiner provided no evidence or argument why, despite the lack of gel extraction, the 
person of ordinary skill in the art would have recognized that the results of the 
combination were predictable as the Examiner claims (MPEP §2143 A. 3.). 
Appellants note that an additional step of a gel extraction would be at odds with the 
identified advantages (speed etc.), that, according to the Office, would cause the person 
skilled in the art to combine the references. 
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Appellants have, however, taken note of the fact that Uhlmann '99 teaches an 
amplification which is followed by gel extraction of the amplification product, cloning to 
produce single stranded DNA and sequencing. 

Taking into account tliese additional elements taught by Uhlmann '99 in the combination 
of the Uhlmann '99 and Nyren, appellants would like to point out that Uhlmann '99's 
amplification primers are not detectably labeled and that Uhlmann '99's amplification 
product is, after gel extraction, cloned to produce single stranded DNA. The cloning 
follows Uhlmann '99's amplification and gel extraction and precedes the sequencing. 
("[P]lasmid DNA of positive clones . . . were sequenced by the dideoxynucleotide chain- 
termination method." (Uhlmann '99, see para. 2.5, sentence bridging page 1751, left col. 
to page 1 751 , right col.). Appellants note that the person skilled in the art would be 
reluctant to make the modification to Uhlmann '99's amplification primers, namely 
detectably label Uhlmann '99's amplification primers, as it would interfere with Uhlmann 
'99' s subsequent cloning step. For example, U.S. Patent 6,589,736 to Rothschild et al. 
discloses in its background section, "PGR products that are biotinylated are not suitable 
material for cloning." (col. 7, starting on line 23). The same patent states also in col. 
34, starting on line 40 that "the presence of biotin on the nascent DNA can interfere with 
its subsequent utilization in cloning or hybridization analysis." 

Thus, applicants submit that a modification that employs the PGR as taught by 
Uhlmann '99 would render Uhlmann '99 unsatisfactory for its intended purpose (see 
MPEP §2143.01 , V- citing In re Gordon, 733 F.2d 900, 221 USPQ 1 125 (Fed. Gir. 
1984). 

Furthermore, in In re Ratti, where the court noted that the "suggested 
combination of references would require a substantial reconstruction and redesign of 
the elements shown in [the primary reference] as well as a change in the basic 
principles under which the [primary reference] construction was designed to operate." In 
re Raw, 270 F.2d 810, 813 (GGPA 1959) (Emphasis added). 

Applicants respectfully submit that the combination of Uhlmann '99 and Nyren 
change the basic principles under which Uhlmann '99 was designed to operate. 
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Using the current rationale of the Examiner that advocates a replacement of 
Uhlmann '99's PGR, cloning and sequencing, with Nyren's PGR and sequencing, leaves 
Uhlmann's DNA in agarose beads (which was used to fix single stranded DNA) for this 
PGR and sequencing. 

Even though the Examiner did not argue that the the bisulfite treatment could 
have been performed in the solution that Nyren describes and thus a prima facie case 
was not made, e.g., in the paragraph bridging col. 17 and 18, such a change would 
constitute a substantial reconstruction that would change the basic principle under 
which Uhlmann '99 was designed to operate. 

Taking up a earlier rationale of the Examiner that advocated a modification of 
Uhlmann 99's PGR with the PGR taught by Nyren, the inclusion of labeled primers in 
the PGR, would not only render Uhlmann '99 inoperative for its intended purpose, but 
also constitute a substantial reconstruction that would change the basic principle under 
which Uhlmann '99 was designed to operate. 

Not unlike the fact pattern in In re Ratti, the Office seeks to exchange a rather 
laborious, but "waterproof" method with high speed method. In re Ratti, 270 F.2d 810, 
813 (GGPA 1959). 

THE ADVANTAGES DESCRIBED BY NYREN AND CONSIDERED RELEVANT IN 

THE OBVIOUSNESS ANALYSIS WOULD NOT MOTIVATE THE PERSON SKILLED 
IN THE ART TO COMBINE THE TEACHINGS OF UHLMANN '99 AND NYREN 

The Examiner relied in her obviousness analysis on the advantages that Nyren 
describes for his method. Appellant recognize that an advantage is a strong rationale 
for combining references. (MPEP 2144, II). 

The claimed invention is a method for determining the methylation status of a 
nucleotide. The Examiner uses (a) a reference that seeks to confirm whether 
differences in 2-D fingerprinting patterns are indeed the result of differential methylation 
with (b) a reference that provides a new and fast sequencing method. 
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The question remains if the person would have combined the references considering 

the teachings of the references in view of the advantages cited by the Office, Uhlmann 

'99 tried to obtain verification that the differences in 2-D-fingerprinting spots of tumor 

and non-tumor DNA are in fact a result of changed methylation, a task that requires 

primarily precision. The prospect of an automatic approach for large scale, non- 

electrophoretic sequencing procedures which allow for continuous measurements, 

handling of multiple sample at the same time (1/23/09 Office Action, page 6) as 

described by Nyren, would be, if at all, at best be of secondary importance. 

Nyren himself notes some issues with his method that could affect precision and thus 

discourage usage in methods that involves a high degree of precision. In col. 7, starting 

at line 15, Nyren notes: 

"A potential problem which has previously been observed with P Pi-based 
sequencing methods is that DATP, used in the sequencing (chain 
extension) reaction, interferes in the subsequent luciferase-based 
detection reaction by acting as a substrate for the luciferase enzyme. This 
may be reduced or avoided by using, in place of deoxy- or dideoxy 
adenosine triphosphate (ATP), a DATP or ddATP analogue which is 
capable of acting as a substrate for a polymerase but incapable of acting 
as a substrate for a said PPi-detection enzyme." {emphasis added) 

Furthermore, Nyren also makes clear that accumulation of reaction by-products may 
take place. While the problem can be avoided by periodic washing, it also adds 
reluctance if precision is the primary goal as in Uhlmann '99 (Nyren, col. 8, lines 57 to 
60). By-products are clearly a concern in Nyren. The additional components of the 
reaction mixture and additional reaction products stemming from the bisulfite treatment 
should equally be of concern. 

Appellants also note that an obviousness analysis starts with an analysis of the 
prior art, not from the claimed invention. The question is, whether it would have been 
obvious, at the time the invention was made, to combine and/or modify the prior art to 
arrive at the claimed invention. 

In this context, appellants direct the Board's attention to the discussion of non- 
obviousness in Ortho-McNeil Pharmaceutical v. Mylan Labs, 520 F.3d 1358, 86 
U.S.P.Q.2d 1 196 (Fed. Cir. 2008). In particular, appellants direct the Board's attention 
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to, Judge Rader notation that "In retrospect, Dr. Maryanoff's pathway to the invention, of 
course, seems to follow the logical steps to produce these properties, but at the time of 
invention, the inventor's insights, willingness to confront and overcome obstacles, and 
yes, even serendipity, cannot be discounted." Id. at 1364. Hindsight like reasoning is 
only improper if it included knowledge gleaned only from appellants disclosure. In this 
context, appellants note that in Ortho-McNeil the Court specifically stated that the TSM 
test, flexibly applied (in the unpredictable arts) merely assures that the obviousness test 
proceeds on the basis of evidence - teachings, suggestions (a tellingly broad term), or 
motivations (an equally broad term) - that arise before the time of invention as the 
statute requires. Appellants respectfully submit their belief that, for the reasons 
provided above, the appropriate showing was not provided. 

B. THE COMBINATION OF UHLMANN '99, NYREN AND HERMAN DOES 
NOT RENDER CLAIMS 12-16, 18 AND 38 OBVIOUS; 

C. THE COMBINATION OF UHLMANN '99, NYREN, HERMAN AND 
FEINBERG DOES NOT RENDER CLAIM 17 OBVIOUS 

Claim 1 2 is an independent method claim. Claims 1 3-1 6, 1 7, 1 8, (1 9, 20, 22-26, 28 
29) and 38 are directly or indirectly dependent on claim 12. 

Claims 12-16 and 1 8 were rejected over Uhlmann' 99 and Nyren in view of 
US Patent 6,258,568 to Herman. 

Claim 17 was rejected over Uhlmann' 99 and Nyren in view of US Patent 
6,258,568 to Herman and further in view of US Publication 2003/0232351 to 
Feinberg. 

With regard to claim 12, the Office explains on page 4 that part of the limitation 

(d), namely the "to diagnose a pathological condition" or the predisposition therefore is 

not an actual step, but an intended use of claim limitation (d): 

"(d) detecting whether said nucleotide is methylated or not methylated 
at said predetermined position in the sample to diagnose said pathological 
condition or the predisposition for said pathological condition ." {emphasis 
added) 
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Appellants note that this language is presented in the body of the claim and not 
in the preamble (MPEP §21 1 1 .02). Appellants further note that this language does not 
constitute optional language in accordance with MPEP §21 1 1 .04. 

The Board is referred to the argument presented with respect to the independent 
claims. 

D. THE COMBINATION OF UHLMANN '99, NYREN AND SYLVAN DOES NOT 
RENDER CLAIMS 10, 25, 34 AND 39 OBVIOUS 

With respect to claims 10, 25, 34 and 39, the Examiner conceded: 

- that Uhlmann '99 when combined with Nyren do not teach a method for further 
comprising calculating a frequency of methylated nucleotides from the results of said 
real time sequencing (claims 10 and 25). 

- that Uhlmann '99 when combined with Nyren do not teach a method wherein an allele 
frequency of 5% can be detected or a method wherein an allele frequency of 5% with a 
stardard deviation of no more than 1% is detected (claims 34 and 39). 

The Examiner, however, states that US Patent 7,078,168 to Sylvan (hereinafter "Sylan") 
teaches a method for determining the allele frequency in a population of nucleic acid 
molecules. The method is in particular used to determine allele frequencies for single 
nucleotide polymorphisms (SNPs) or other mutations or genetic variations (e.g. 
nucleotide insertions, additions or deletions, gene, chromosome or genome duplications 
(or multiplications) etc. in pooled nucleic acid samples or other samples (including 
single samples)) which may contain allelic variations. Sylan makes no reference to the 
methylation status of his population of nucleic acid molecules and certainly not 
"calculating a frequency of methylated nucleotides" as required by claims 10 and 25. 

The method of Sylan relies on determining the frequency of an allele in a given 
population by pooling the nucleic acid sequences of the said population and performing 
a "primer-extension" type reaction, using primers designed for particular SNPs/alleles. 
In a sequencing reaction a pattern of nucleotide incorporation in said primer extension 
products at the positions that correspond to said polymorphic position of interest can be 
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obtained and the frequency of said allele from said pattern of nucleotide incorporation 
can be determined. 

With regard to claim 34 which is dependent on claim 10, the Office states that due the 
use of the term "can be" the claim actually does not require detecting an allele 
frequency. 

In this context, appellants note that certain terms may raise the question as to whether it 
actually limits a claim (MPEP §21 1 1 .04). Appellant's respectfully submit that this is not 
such a case. 

In particular, the "can be" term in the context provided clearly states an ability that is 
either present or not. That is, the method either can detect the allele frequency or not. 
This ability constitutes a limitation of the claim. The language is, in the context 
provided, not "conditional" (Office Action, 1/23/09, page 10, last line) and thus 
constitutes a proper limitation. 

With regard to claim 39 which is dependent on claim 10 and omits the term "can be", the 
Office appears to concede that Sylvan does not exemplify a method wherein an allele 
frequency of 5% was detected with a standard deviation of not more than 1 % (appellant 
direct the attention to Fig. 6 of Sylvan). However, the Examiner explains that from 
Sylvan's teaching there is an expectation that an allele frequency of 5% with a standard 
deviation of not more than 1% could be detected, but suggests that, even if the 
expected results do not end up being equivalent to the [claimed] results, it would have 
been obvious to modify the method in order to determine the recited allele frequency. 
The Examiner has not explained the basis for this "expectation" to support a prima facie 
case of obviousness. The key to supporting any rejection under 35 U.S.C. §103 is the 
clear articulation of the reason(s) why the claimed invention would have been obvious. 
The Supreme Court in KSR noted that the analysis supporting a rejection under 35 
U.S.C. §103 should be made explicit. The Court quoting In re Kahn, 441 F.3d 977, 988, 
78 USPQ2d 1329, 1336 (Fed. Cir. 2006), stated that "'[Rjejections on obviousness 
cannot be sustained by mere conclusory statements; instead, there must be some 
articulated reasoning with some rational underpinning to support the legal conclusion of 
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obviousness." KSR Intern. Co. v. Teleflex Inc., 550 U.S. 398, 481 (2007) (see §MPEP 
§2141). 

In the obviousness reasoning, the Office merely refers to advantages of Sylvan, in 
particular the fact that this method "determines the exact sequence of a nucleic acid 
fragment while directly measuring the amount of nucleotide incorporated." The Office 
also refers to the accuracy, cost effectiveness and speed with which the information can 
be obtained. 

The Office however, did not provide any reasoning why the person skilled in the art, 
apart from the advantages of Sylvan's method per se, would make the combination to 
arrive at the claimed invention, namely calculate the frequency of methylated 
nucleotides, in particular with the accuracy set forth in claims 34 and 39. 



E. THE COMBINATION OF UHLMANN '99, NYREN AND LAIRD DOES NOT 
RENDER CLAIM 35 OBVIOUS; 

F. THE COMBINATION OF UHLMANN '99, NYREN AND HYMAN DOES NOT 
RENDER CLAIM 37 OBVIOUS 

Claim 35 is a method claim dependent upon claim 1 . Claim 35 was rejected over 
Uhlmann '99 in view of Nyren as applied to claim 1 and in further view of Laird. 



The Board is referred to the argument presented with respect to claim 1 . 

Claim 37 is a method claim dependent upon claim 8. Claim 8 is a method claim 
dependent upon claim 1 . Claim 37 was rejected over Uhlmann '99 in view of Nyren 
as applied to claim 1 and in further view of US Patent 5,602,000 to Hyman. 

The Board is referred to the argument presented with respect to claim 1 . 
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CONCLUSION 

Having set forth the factual and legal basis which supports the patentability of the 
claims on appeal, it is respectfully submitted that claims 1 to 5, 7 to 20 and 22 to 39 are 

allowable. Accordingly, it is respectfully urged that the Board reverse the Examiner's 
rejection thereof. 



Respectfully submitted, 

By: /Joyce v. Natzmer/ 
Joyce von Natzmer 

Registration No. 48,120 
Customer No. 46002 
Direct Line: (301) 657-1282 

Pequignot + Myers LLC 
200 Madison Ave., 1901 
New York, NY 10016 



January 25, 2010 
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CLAIMS APPENDIX 



1 . A method for detecting the methylation status of a nucleotide at a predetermined 
position in a nucleic acid molecule comprising: 

(a) treating a sample comprising said nucleic acid molecule in an aqueous 
solution with an agent suitable for the conversion of said nucleotide if 
present in 

(i) methylated form; or 

(ii) non-methylated form 

(b) to pair with a nucleotide normally not pairing with said nucleotide prior to 
conversion; 

(c) amplifying said nucleic acid molecule treated with said agent via at least 
one amplification primer to produce an amplification product and 
converting said amplification product into single stranded amplified nucleic 
acid molecules, wherein said at least one amplification primer is 
detectably labeled with a detectable label that forms an anchor for removal 
of said single stranded amplified nucleic acid molecules to generate a 
single stranded amplified nucleic acid molecule; 

(d) real-time sequencing said single stranded amplified nucleic acid molecule; 
and 

(e) detecting whether said nucleotide is methylated or not methylated at said 
predetermined position in the sample. 
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2. The method of claim 1 wherein said sample is derived from a tissue, a body fluid or 
stool. 

3. The method of claim 2 wherein said tissue is a tumor tissue, neurodegenerative 
tissue or a tissue affected with another neurological disorder, 

4. The method of claim 1 wherein said nucleic acid molecule is a DNA molecule or an 
RNA molecule. 

5. The method of claim 1 wherein in (b) the nucleic acid molecule is amplified via LCR 
or PGR. 

6. (Canceled) 

7. The method of claim 1 wherein said amplification primer is labeled with (a) biotin, 
(b) avidin, (c) streptavidin or (d) a derivative of (a), (b) or (c) or a magnetic bead. 

8. The method of claim 1 wherein said nucleotide of (a)(i) is an adenine, guanine or a 
cytosine. 

9. The method of claim 1 wherein said real-time sequencing comprises: 

(a) hybridization of a sequencing primer to said amplified nucleic acid 
molecule in single-stranded form; 
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(b) addition of a DNA polymerase, a ATP sulfurylase, a luciferase, an 
apyrase, adenosine-phosophosulfate (APS) and luciferin; 

(c) sequential addition of dATP, dCTP, dTTP and dGTP; 

(d) detection of a luminescent signal wherein an intensity of the luminescent 
signal is correlated with the incorporation of a specific nucleotide at a 
specific position in the nucleic acid molecule and wherein the intensity of 
said signal is indicative of the methylation status of said nucleotide at said 
predetermined position. 

1 0. The method of claim 1 , further comprising calculating a frequency of methylated 
nucleotides from results of said real-time sequencing. 

1 1 . The method of claim 1 wherein said agent suitable for the conversion of said 
nucleotide to pair with nucleotide normally not pairing with said nucleotide is a 
bisulfite, preferably sodium bisulfite. 

12. A method for the diagnosis of a pathological condition or the predisposition for a 
pathological condition comprising detection of the methylation status of a nucleotide 
at a predetermined position in a nucleic acid molecule comprising: 

(a) treating a sample comprising said nucleic acid molecule in an aqueous 
solution with an agent suitable for the conversion of said nucleotide if 
present in 

(i) methylated form; or 
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(ii) non-methylated form 

to pair with a nucleotide normally not pairing with a said nucleotide 
prior to conversion; 

(b) amplifying said nucleic acid molecule treated with said agent via at least 
one amplification primer to produce an amplification product and 
converting the amplification product into single stranded amplified nucleic 
acid molecules, wherein said at least one amplification primer is 
detectably labeled with a detectable label that forms an anchor for removal 
of said single stranded amplified nucleic acid molecules to generate a 
single stranded amplified nucleic acid molecule; 

(c) real-time sequencing said single stranded amplified nucleic acid molecule; 

and 

(d) detecting whether said nucleotide is methylated or not methylated at said 
predetermined position in the sample to diagnose said pathological 
condition or the predisposition for said pathological condition. 

13. The method of claim 12 wherein said pathological condition is cancer, a 
neurodegenerative disease or another neurological disorder. 

14. The method of claim 13 wherein said cancer is a primary tumor, a metastasis or a 
residual tumor. 

1 5. The method of claim 14 wherein said primary tumor is a glioma. 
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16. The method of claim 15 wherein said glioma is an astrocytoma, oligodendroglioma, 
an oligoastrocytoma, a glioblastoma, or a pilocytic astrocytoma. 

17. The method of claim 38 wherein said neurodegenerative disease is Alzheimer's 
disease, Parkinson disease, Huntington disease, or Rett-Syndrome. 

18. The method of claim 38 wherein said neurological disorder is Prader-Willi- 
Syndrome, Angelman-Syndrome, Fragile-X-Syndrome, or ATR-X-Syndrome. 

19. The method of claim 12 wherein said nucleic acid molecule is a DNA molecule or an 
RNA molecule. 

20. The method of claim 12 wherein in (b) the nucleic acid molecule is amplified via LCR 
or PGR. 

21. (Canceled) 

22. The method of claim 12 wherein said amplification primer is labeled with (a) biotin, 
(b) avidin, (c) streptavidin or (d) a derivative of (a), (b) or (c) or a magnetic bead. 

23. The method of claim 12 wherein said nucleotide of (a)(i) is an adenine, guanine or a 
cytosine. 
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24. The method of claim 12 wherein said real-time sequencing comprises: 

(a) hybridization of a sequencing primer to said amplified nucleic acid 
molecule in single-stranded form; 

(b) addition of a DNA polymerase, a ATP sulfurylase a luciferase, an apyrase, 
adenosine-phosphosulfate (APS) and luciferin; 

(c) sequential addition of dATP, dCTP, dTTP and dGTP; 

(d) detection of a luminescent signal wherein the intensity of the luminescent 
signal is correlated with the incorporation of a specific nucleotide at a 
specific position in the nucleic acid molecule and wherein the intensity of 
said signal is indicative of the methylation status of said nucleotide at said 
predetermined position. 

25. The method of claim 12 further comprising calculating a frequency of methylated 

nucleotides from results of said real-time sequencing. 

26. The method of claim 12 wherein said agent suitable for the conversion of said 
nucleotide to pair with a nucleotide normally not pairing with said nucleotide is a 
bisulfite, preferably sodium bisulfite. 

27. The method of claim 1 wherein said method is a high-throughput method. 
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28. The method of claim 12 wherein said sample is derived from tissue, a body fluid or 
stool. 

29. The method of claim 28 wherein said body fluid is blood, serum or urine. 

30. The method of claim 1 wherein said nucleotide is a cytosine and is part of one of the 
following sequences: CpG, CpNpG or CpNpN. 

31 .The method of claim 1 , wherein the methylation status of more than one 

predetermined nucleotide is detected and a number of samples are analyzed at the 
same time. 

32. A method for generating new nucleotide pairing partners upon amplification of at 
least one nucleic acid molecule for the detection of the methylation status of 
nucleotides of said nucleic acid molecule, said method comprising: 

(a) providing said at least one nucleic acid molecule; 

(b) treating said nucleic acid molecule with an agent suitable for conversion of 
a nucleotide if present in methylated form or non-methylated form to pair 
with nucleotide pairing partners normally not pairing with said nucleotide 
prior to conversion; 

(c) amplifying said nucleic acid molecule via at least one amplification primer 
to produce an amplification product and converting the amplification 
product into a single stranded nucleic acid molecules, wherein said at 
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least one amplification primer is detectably labeled with a detectable label 
that forms an anchor for removal of said single stranded amplified nucleic 
acid molecules to generate a single stranded amplified nucleic acid 
molecule comprising said new nucleotide pairing partners normally not 
pairing with said nucleotide prior to conversion and; 

(d) real-time sequencing said single stranded amplified nucleic acid molecule; 

(e) determining the amount of said nucleotide pairing with said new nucleotide 
pairing partners to detect the methylation status of nucleotides of said 
nucleic acid molecule. 

33. The method of claim 1 , wherein the methylation status of more than one 
predetermined nucleotide is determined. 

34. The method of claim 10, further comprising detecting an allele frequency, wherein an 
allele frequency of 5% can be detected. 

35. The method of claim 1 , wherein said primer does not comprise CpG. 

36. The method of claim 1 , wherein all nucleotides formerly methylated or not 
methylated in said nucleic acid molecule are detected. 

37. The method of claim 8, wherein said nucleotide of (a)(i) is an adenine or guanine. 
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38. The method of claim 12, wherein said pathological condition is a neurodegenerative 
disease or another neurological disorder. 

39. The method of claim 10 further comprising detecting an allele frequency of 5% with a 
standard deviation of not more than 1 %. 



EVIDENCE APPENDIX 

Uhlmann, K. et al., "Changes in methylation patterns identified by two-dimensional 
DNA fingerprinting," Electrophoresis 20(8):1 748-55 (1 999): This evidence was 
entered in the record per "Notice of References Cited" appended to the Office 
Action of July 1 7, 2007. 

US 6,258,568 (2001) to Nyren et al.: This evidence was entered in the record per 
"Notice of References Cited" appended to the Office Action of February 22, 2006. 

US 5,786,1 46 (1 998) to Herman: This evidence was entered in the record per the 

"Notice of References Cited" appended to the Office Action of February 22, 2006. 

US 2003/0232351 (2003) to Feinberg: This evidence was entered in the record per 
"Notice of References Cited" appended to the Office Action of February 22, 2006. 

US 7,078,168 (2006) to Sylvan This evidence was entered in the record per "Notice 

of References Cited" appended to the Office Action of July 1 7, 2007. 

US 2002/0086324 (2002) to Laird: This evidence was entered in the record per 
"Notice of References Cited" appended to the Office Action of July 1 7, 2007. 

US 5,602,000 (1997) to Hyman: This evidence was entered in the record per 
"Notice of References Cited" appended to the Office Action of June 1 9, 2008. 
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Changes in methylation patterns identified by two- 
dimensional DNA fingerprinting 

Two-dimensional DNA fingerprinting (2-D fingerprinting) is a sensitive tool for genomic 
difference analysis between tumor DNA and constitutive DNA of glioma patients. 
Numerous differences were found even in low-grade gliomas. They can be interpreted 
as deletions, amplifications, rearrangements, Haelll restriction site mutations, tandem 
repeat Instabilities, or methylation differences. The influence of methyl groups on the 
melting behavior of double-stranded DNA fragments in a denaturing gradient gel was 
demonstrated by analyzing the migration of X-phage DNA fragments in 2-D fingerprint 
gels. A characteristic intensity shift between two neighboring spots in several glioma 
samples was identified and verified by rehybridization of 2-D filters with a cloned DNA 
fragment corresponding to the lower spot in 10 out of 11 pilocytic astrocytomas. We 
hypothesized that this shift may be related to an alteration in the methylation pattern of 
the tumor DNA. This was specifically tested by analyzing the underlying 750 bp 
genomic fragment (including 21 CpG dinucleotides) with bisulfite treatment of agarose- 
embedded DNA. A methylation grade of 88% in tumor DNA as compared to 96% in 
blood DNA was found. Although only one CpG is located in the melting domain of the 
cloned fragment, this particular CpG is methylated In all blood samples, but mostly 
demethylated In the tumor samples. In conclusion, we demonstrate that 2-D finger- 
printing may be a powerful tool for the detection of DNA methylation changes in 
genomic difference analysis. 

Keywords: Two-dimensional DNA fingerprinting / Methylation / Bisulfite treatment / Tumor / 
Pilocytic astrocytoma EL 3486 



1 Introduction 

Two-dimensional (2-D) DNA fingerprinting combines two 
separation techniques. In the first dimension DNA restric- 
tion fragments are separated according to their size. In 
the second dimension separation Is achieved by denatur- 
ing gradient gel electrophoresis (DGGE), revealing differ- 
ences in base composition and nucleotide sequence of 
the fragments. The additional dimension significantly 
improves the power to detect somatic changes in 
genomic DNA as compared to one-dimensional DNA fin- 
gerprinting [1]. The principle of this methodology was first 
applied to detect variations in the Escherichia coli 
genome [2]. The introduction of blotting and hybridization 
with probes specific for repetitive sequences made this 
method applicable to the analysis of complex eukaryotic 
genomes [3]. Various applications in different fields of 
research have been reported [4-10]. 
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In a recent study, we used 2-D DNA fingerprinting suc- 
cessfully to screen low-grade gliomas for changes in 
genomic DNA In comparison to the blood DNA of the 
patients [11]. Between two and 11 alterations were 
detected in the 28 blood/tumor pairs analyzed. We 
believe that these alterations are related to DNA sequenc- 
es involved in tumor initiation and/or progression. Some 
of the affected DNA fragments were cloned and rehybrid- 
Ized onto the 2-D filters [12]. For one of these cloned 
probes, the same spot difference was demonstrated in 
comparisons of 8 out of 9 independent pilocytic astrocyto- 
mas and in 1 of 2 ependymal tumors with the respective 
constitutive DNAs. A particular spot in the blood DNA pat- 
tern that seemed to be split in tumor DNA was observed 
In these experiments. The two spots were of the same 
size but showed a different melting behavior, with the 
additional upper spot in the tumor DNA being immobilized 
at a lower melting temperature than the original lower spot 
known from the blood pattern. We interpreted this phe- 
nomenon to be a consequence of different methylation 
patterns of this fragment. 

Prokaryotes show methylation at different nucleotides 
corresponding to a specific pattern of methylases and 
restriction enzymes. In contrast, in human DNA only cyto- 
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sines followed by guanine (CpG) appear to be methylated 
[13]. Cytosine methylation as an epigenetic modification 
is known to influence a variety of nuclear processes, such 
as replication of DNA and gene expression [13], Further- 
more, methylation is recognized as an important factor in 
tumor development [14], It influences not only the expres- 
sion of single genes but also the conformation of the DNA 
strands in extended regions and the activity status of 
whole chromosomes [15]. Both hypo- and hypermethyla- 
tlon are known to play a crucial role in these regulation 
processes [13]. As a general rule, a loss of methyl groups 
leads to Increased gene expression [14]. Tumor suppres- 
sor genes may be associated with CpG islands, which are 
not methylated constitutively and thus can be a target of 
hypermethylatlon. Oncogenes may also undergo a 
change in methylation but in these cases a loss of methyl 
groups is expected to occur in tumorigenesis. 

In this study, we present definitive evidence for differential 
methylation in the blood and tumor of a specific 2-D DNA 
fingerprinting spot. Initially, 2-D patterns of methylated 
and nonmethylated X-Dl^A were investigated. Usually, 
the restricted phage genome serves as a standardization 
marker for the highly variable and complex 2-D pattems 
of eukaryotic DNA. The methylated X-DNA showed vari- 
ous "twin spots" whereas in the 2-D pattern of nonmethyl- 
ated phage DNA the previously characterized 37 single 
marker spots emerged [16]. This result strengthened our 
hypothesis and prompted us to determine the methylation 
state of distinct nucleotides within a human DNA fragment 
of interest using the bisulfite approach [17]. This tech- 
nique is based on sodium bisulfite-mediated conversion 
of nonmethylated cytosines to uracil. It allows the Identifi- 
cation of 5-methylcytosines (^"'C) in genomic DNA. Only 
small amounts of material are needed due to subsequent 
PGR amplification of the modified DNA. After cloning and 
sequencing of the strand-specific PGR products, this 
method reveals the methylation status of distinct GpGs In 
individual DNA strands. We found the tumor DNA of the 
Investigated glioma specific spot to be hypomethylated. 
Moreover, a distinct GpG within a single-copy sequence 
showed consistent demethylation in pilocytic astrocyto- 
mas and thus may indicate the presence of a candidate 
oncogene in that region. 

2 Materials and methods 

2.1 2-D DNA fingerprinting 

Genomic DNA from peripheral blood lymphocytes and 
from tumor tissue was prepared as described previously 
[11]. Ten micrograms from each DNA sample were 
digested with 50 units of Haelll restriction enzyme accord- 
ing to the supplier's recommendation (Gibco BRL, Eggen- 



stein, Germany). Lambda phage DNA served as marker 
DNA. The marker was prepared by separate digestions of 
wild-type methylated ^-DNA from DNA adenosine methyl- 
ase {darm-), DNA cytosine methylase (derm-) Escherichia 
coil Le597 (clind 1 ts857 Sam7; Gibco BRL) and non- 
methylated ^-DNA from Tm"", dam", and dcm" E. coli GM 
119 (Sigma, Deisenhofen, Germany) with the restriction 
enzymes Haelll, Bg/I and Rsa\ according to the supplier's 
recommendations (Gibco BRL). Next, the X-DNA frag- 
ments were mixed. 

The first-dimensional electrophoresis was run in a 6% 
polyacrylamide gel (PAG; premixed acrylamide/bisacryl- 
amlde 37, 5:1, 30% solution; Roth, Karlsruhe, Germany) 
at 50°G for 4 h at 150 V. The second-dimensional electro- 
phoresis was run at 60°G for 15 h at 150 V in a 6% PAG 
as well, but with a linear concentration gradient of dena- 
turant included (100% denaturant = 7 m urea, 40% forma- 
mlde; gradient 10-75%). After gel electrophoresis the 
DNA was blotted onto a positively charged nylon mem- 
brane (Qiabrane; Qiagen, Hilden, Germany) by semidry 
electroblotting. The obtained filters were hybridized with 
different mini- and microsatellite core probes. More 
details about the experimental conditions have been pub- 
lished elsewhere [1,11]. 

2.2 Identification of clustered changes 

We used the well-defined marker pattern of the restriction 
enzyme cleaved A.-DNA for standardization. Upon analy- 
sis of the spot alterations of 34 patients, a significant clus- 
tering of spot changes was observed. The standardization 
method developed for 2-D DNA fingerprints is described 
elsewhere [18]. 

2.3 Elutlon and cloning of DNA fragments 

Spots from 2-D gels were cloned using a protocol which 
includes the preparation of a duplicate and a master gel 
[12]. The nylon membrane ("master blot") was used for 
spot localization in a standard 2-D fingerprinting experi- 
ment using the minisatellite probe for the spot cluster of 
interest. The duplicate diethylamlnoethyl (DEAE) mem- 
brane was used for spot elution and cloning. The spot 
position was determined with the help of the X-DNA 
marker that was used on both gels. The spot DNA was 
recovered from the DEAE membrane by high salt elution 
and amplified by PGR after ligation of adaptor-oligo cas- 
settes. Finally, the PGR products were cloned and used 
as single locus probes to rehybridize the 2-D filters. 

2.4 IMapping of single locus probes 

Sequencing of inserts containing the mini- or microsatel- 
lite was performed using the Thermo Sequenase cycle 
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sequencing kit (Amersham Pharmacia Biotech, Cleve- 
land, Ohio, USA). Single-copy sequences flanking the 
repeat element of the cloned DNA fragment were.PCR- 
amplif ied and used as probe for a genomic library screen- 
ing. P1 library filters as well as positive P1 clones were 
obtained from GenomeSystems (St. Louis, MO, USA). 
Hybridization of the filters was carried out according to 
the supplier's instructions. Chromosomal localization of 
the PI clones was determined by fluorescence in situ 
hybridization (FISH). Clone DNA was labeled with biotin 
by nick translation and hybridized to normal human lym- 
phocyte metaphase chromosomes. Regional assignment 
of signals was determined by analysis of ten well-spread 
metaphases. A 4',6-diamidin-2-phenylindol-dihydrochlor- 
ide (DAPI) counter-staining was performed for band allo- 
cation. 



genomic DNA (blood & tumor) 
restricted with EcoRI 



5min, SS^'C 



agarose beads 



bisulphite treatment 
4h, 50'C 



PGR 



cloning 



sequencing 




fixing of single- 
stranded DNA 



strand-specific 
amplification 



GATC GATC 



Figure 1. Overview of the experimental approach to 
determine the methylatlon state of distinct CpGs. The 
Immobilization of the single-stranded DNA in agarose 
beads facilitates the modification procedure. After conver- 
sion, sense and antisense strands are no longer comple- 
mentary. Hence, separate PCR reactions are required for 
the analysis of both strands. 
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2.5 Determination of ^"^C by bisulfite treatment 

The protocol is outlined in Fig. 1. Genomic DNA was 
digested with the restriction enzyme EcoRI (Promega, 
Mannheim, Germany) and precipitated with ethanot. 
About 100 ng denatured DNA (5 min, 95°C) in 1.7% low 
melting agarose (Sigma) were dropped into chilled min- 
eral oil to form agarose beads [19]. The fixed single- 
stranded DNA was subjected to bisulfite treatment (2.5 m 
sodium metabisulfite, 125 mM hydroquinone, pH 5.0) for 
4 h at 50^0. Tubes where covered to protect the reaction 
against light. After the recommended reaction time, the 
agarose beads were washed four times in an appropriate 



A 

blood 



tumor 



B 



589 bp: 



Hae ill 

n 



Hae 111 

— ^ — 



blood »"^Q'""9 domain^ 



GC-ctamp 



I-CHJ 



tumor 4- 



CpG 



Figure 2. (A) Hybridization of 2-D filters with the single 
locus probe 48a. The intensity shift obtained with tumor 
DNA as compared to the patients' blood DNA is typcial for 
pilocytic astrocytomas and was observed in a number of 
different patients. Here, the corresponding hybridization 
signals of patient 493 are shown (cf. Fig. 4 in [12]). (B) 
Schematic presentation of the 2-D DNA fragment 
detected with probe 48a. Haelll Is used for digestion of 
the genomic DNA prior to 2-D electrophoresis. Thus, any 
fragment is flanked by Haelll restriction sites. The frag- 
ment contains a large GC-rich repetitive sequence acting 
as a clamp during the melting process. The lowest melting 
domain Is responsible for the mobility of the fragment in 
the denaturing gradient. Demethylation of CpGs within 
this domain might facilitate the melting of the fragment in 
the tumor DNA. 
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amount of 1 X TE (10 nriM Tris-HCI, pH 8.0, 1 mM EDTA) 
followed by a desulfonation step in 0.2 m NaOH. This 
reaction was stopped by adding a 1/5 volume of a 1 m 
hydrochloric acid solution. Again, the agarose beads were 
rinsed in 1 x TE and then used for PGR. Beads were 
stored up to four weeks at 4°C. The sequence of interest 
was amplified by PGR in separate reactions for the sense 
and the antisense strand. Each reaction mixture included 
one agarose bead with about 100 ng of bisulfite-treated 
DNA in a total volume of 50 ^L. PGR reactions were car- 
ried out for 40-45 cycles with denaturation at 95°G, 
annealing at 54-58°C and extension at 72**C. The PGR 
products were gel extracted (Qiaquicl<; Qiagen) and 
cloned using the Topo TA Gloning Kit from Invitrogen 
(Leek, The Netherlands). Plasmid DNA of positive clones 
was prepared with QIaprep (Qiagen) and sequenced by 
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the dideoxynucleotide chain-termination method using 
the Thermo Sequenase cycle sequencing kit from Amer- 
sham Pharmacia Biotech. 

3 Results and discussion 

3.1 A tumor-related 2-D spot shift specific for 
pilocytic astrocytomas 

A single-locus probe (48a) cloned from a complex 2-D 
DNA fingerprinting pattern and mapped to 11q14 [12] 
revealed a typical intensity shift between two neighboring 
spots when rehybrldlzed to 2-D filters with blood and 
tumor DNA of different patients (Fig. 2A). Ninety-one per- 
cent of all pilocytic astrocytomas screened (10/11) and 
11% of the screened astrocytomas (2/18) showed this 
effect. The observed phenomenon was interpreted as a 




UF% 




Figure 3. Gomparison of 
methylated and nonmethylated 
X-DNA by 2-D DNA fingerprint- 
ing. Both DNA samples were 
analyzed on the same gel to 
guarantee identical running con- 
ditions. Additional upper and 
lower spots may be due to dam- 
site and dcm-site methylations, 
respectively. The upper details 
show typical twin spots (circled) 
for the partially methylated DNA. 
The lower panels reveal a shift of 
a fragment (arrow) known to pos- 
sess two adenosine methylation 
sites in its melting domain. They 
are thought to be completely 
methylated in the left sample, 
thereby lowering the melting 
temperature of the sequence. 
%UF, percentage of denaturant 
(100% denaturant: 7.0 m urea/ 
40% formamide). 
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consequence of changes in the methylation pattern within 
the melting domain of the fragment (Fig. 2B). A loss of 
methyl groups was postulated because the change 
seemed to lower the melting temperature. This was in- 
ferred from experiments performed by Collins and Myers 
[20] who studied the influence of one or a few methylated 
bases within the melting domain on the denaturing behav- 
ior of this part of the DNA fragment. In human DNA, 
nearly exclusively cytosine residues in CpGs are methyl- 
ated [14]. According to Collins and Myers, the loss of this 
modification in cytosine residues leads to a destabilization 
of the melting domain and thus to a lower melting temper- 
ature (fm)i a higher spot position within the 2-D gel. 



3.2 Influence of methylation on patterns 

Results obtained from 2-D experiments performed with 
methylated and nonmethylated X.-DNA strengthened our 
hypothesis of a demethylation event in the tumor DNA 
fragment mentioned above. We separated restriction- 
digested partially methylated (> 50% of dam and dcm 
sites) and nonmethylated A,-DNA on a 2-D DNA finger- 
printing gel and obsen^ed a variety of alterations when 
comparing the two separation patterns (Fig. 3). In the 2-D 
DNA fingerprinting pattern of the damVdcm"^ A.-phage, 
twin spots and shifts of single spots as compared to the 
nonmethylated A.-DNA were observed. The double spots 
indicate sites where the fragments exist in two different 
methylation states. The single spot shifts may represent 
fragments turning up only in the modified state if there are 
active methylation enzymes present. Phage X possesses 
116 dam and 23 dcm sites which have different Influences 
on the melting behavior of the cleaved phage DNA. 
Methylated adenosines decrease the whereas methyl- 
ated cytosine residues Increase [20]. Upon in-depth 
analysis of the spots defined in the recent empirical stand- 
ard pattern analysis [16], we verified the above mentioned 
influences on spots showing alterations in their separation 
behavior. All spots of the X-phage isolated from the dam*/ 
dcm"*^ E. CO// strain, which where immobilized earlier in the 
denaturing gel electrophoresis mn than the corresponding 
spots of the nonmethylated ^-DNA, contained more dam 
sites than dcm sites (data not shown). However, some of 
the standard pattern spots did not seem to be immobilized 
on different denaturant concentrations although they har- 
bored methylation sites as well. A reason for this may be 
that methylation of certain sites varies with environmental 
stimuli; thus, these sites may be unmethylated even in the 
damVdmc* strain [21]. Another explanation is that 
although the fragments harbor methylation sites, none of 
these are located in the melting domain and, hence, dif- 
ferential methylation has no impact on the melting behav- 
ior. 



3.3 Methylation state of the cloned spot DNA 
fragment 

The methylation status of 14 CpGs located in the cloned 
589 bp fragment corresponding to probe 48a was Investi- 
gated by sequencing bisulfite-treated DNA from this frag- 
ment (Fig. 4). We chose this method because the PCR 
amplification steps in the protocol allow for low amounts 
of DNA to be analyzed. We created two strand-specific 
primer pairs yielding fragments of 746 and 785 bp, re- 
spectively, extending the original cloned sequence by 
either 100 and 56 bp or 95 and 100 bp at the ends. The 
flanking sequence information was obtained by sequenc- 
ing the corresponding P1 clone. The extended sequence 
contained 21 CpGs, with four of them situated in positions 
unfavorable for analysis, either too close to an end (CpG 
No. 1 and 2) or in GC-rlch repetitive elements (CpG No. 
13 and 14), that are difficult to sequence. Hence, we 
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Figure 4. Analysis of differentially methylated CpGs by 
bisulfite treatment. Sequence ladders of a corresponding 
blood/tumor sample pair (patient No. 493) after chemical 
modification are flanl<ed by the original sequences of the 
untreated DNA. All cytosines appear as thymines (aster- 
isk) unless they were methylated in the original genomic 
DNA. Two CpGs (circled) are present in the sequence. In 
the blood DNA both CpGs are methylated. In the tumor 
DNA, however, the lower one (CpG No. 3) has lost its 
methyl group (arrow). 
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Figure 5. Melting profile of the 
cloned spot DNA fragment. For 
each nucleotide position the tem- 
perature needed to bring this 
base pair into an equilibrium of 
50% melted and 50% hydrogen 
bonded molecules is plotted. The 
three plateaus seen each rep- 
resent a melting domain. How- 
ever, the two domains on the left 
side will in fact behave as one 
domain because of the minimal 
difference of only 1-2°C between 
their melting temperatures. The 
fragment begins to melt In this 
region at temperatures at which 
the long high melting domain 
still resists melting (GC-clamp). 
Since the partially melted mole- 
cule reduces Its electrophoretic 
mobility drastically, only the 
sequence of the lower melting 
domains determines the vertical 

position of the fragment in the 2-D gel. The distribution of the CpGs given at the bottom reveals that only CpG number 7 is 
situated in the lower melting domain. Thus, differential methylation of this particular CpG will exclusively influence the Initial 
melting behavior of the fragment. 
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focused our study on the other 17 CpGs in the sequence. 

On average, five blood and ten tumor clones of the sense 
and antlsense PGR products were sequenced and ana- 
lyzed for every patient. 

The data for three patients with a pllocytic astrocytoma 
are summarized in Table 1. Tumor DNA is hypomethyl- 
ated in comparison to the blood DNA. The average meth- 
ylation grade in blood was 96% compared to 88% In 
tumor in the summary over all CpGs in all clones. In pa- 
tient No. 16026, the same number of different CpGs was 
found to be demethylated In at least one clone in blood 
and tumor (/.e., same number of grey cells in Table 1); 
however, the ratio of methylated to demethylated clones 
was lower in the tumor. Furthermore, other CpGs were In 
part demethylated in blood and tumor of that patient. The 
CpG number 21 was the only one found consistently 
demethylated In all samples tested, whether blood or 
tumor. Four CpGs were found to be consistently methyl- 
ated in blood and tumor, which is to be expected in view 
of the high level of global methylation In the genomic 
region. CpG number 7 was consistently methylated in all 
blood samples but demethylated in all tumor samples. 
This finding has special significance because this particu- 
lar CpG is the only one situated in the melting domain of 
the original fragment cloned from the shifted 2-D DNA fin- 
gerprint spot (Fig. 5). Thus, we proved our hypothesis that 
a tumor-related demethylation event accounts for the ob- 



served intensity shift of two neighboring spots In the com- 
plex 2-D DNA fingerprint pattern (see Fig. 2). The lower, 
methylated spot never completely disappeared in the 
tumor DNA samples. This may be due to contaminating 
nontumor tissue or incomplete demethylation of CpG 
number 7 in the tumors. We prefer the former explana- 
tion, given our previous experience regarding the purity of 
tumor samples. In some cases, the blood samples 
showed a faint upper spot in addition to the prominent 
lower one. This may indicate that even in normal tissue, 
methylation of CpG number 7 is not absolutely complete. 

A reduced level of global DNA methylation is a common 
finding in a variety of cancer types [22-24]. Local hypo- 
methylation has been reported to be associated with a 
higher expression of oncogenes such as c-Ha-ras [25], 
mage-1 [26], and erb-AI [27]. A single demethylated CpG 
can have a significant effect upon the expression of a par- 
ticular gene, as demonstrated for the CCGG site in the 
third exon of c-myc [28, 29]. Therefore, the use of chang- 
es in the methylation pattern as a prognostic marker in 
cancer is under investigation [30, 31]. We are currently 
investigating if the methylation status of CpG number 7 
(see above) influences the expression of a particular 
gene. Exon prediction in the surrounding genomic 
sequence by computer programs (Genie; GenScan) pre- 
dicted a tentative coding sequence homologous to a fam- 
ily of signal transduction proteins (data not shown). 
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Table 1. Methylation status of 21 CpGs in a genomic 
sequence of 750 bp in tliree patients witli pilo- 
cytic astrocytomas 



CpG 




Blood 
Patient No. 






Tumor 
Patient No. 


No. 


454 


493 


16026 


454 




i COOP 


1 


n.d. 


n.d. 


n.d. 


n.d. 


n.d. 


n.u. 


2 


n.d. 


n.d. 


n.d. 


n.d. 


n.d. 


n.o. 


3 








X 


X 




4 




X 






X 


X 


5 














6 




X 


X 




X 




7 








X 


X 


X 


8 








X 






9 






X 


X 


X 




10 








X 


X 




11 


X 




X 




X 


X 


12 


X 








X 




13 


n.d. 


n.d. 


n.d. 


n.d. 


n.d. 


n.d. 


14 


n.d. 


n.d. 


n.d. 


n.d. 


n.d. 


n.d. 


15 




X 




X 


X 




16 








X 






17 


X 






X 






18 














19 














20 














21 


X 


X 


X 


X 


X 


X 



Note; If not indicated othenA^ise, all clones tested (5 for 
blood and 10 for tumor) were methylated at this particula 
CpG. 

X, at least one clone was found In which the cytosine was 
demethylated. 



4 Concluding remarks 

DNA fingerprinting has been widely used to demonstrate 
gains and losses of genetic material in tumors as a result 
of the fatal genetic instability associated with cancer. 
Here, we have shown that 2-D DNA fingerprinting can 
also sen/e as a useful tool In genome-wide screenings for 
methylation differences between constitutional and tumor 
DNAs. Furthermore, this is the first report providing indi- 
cations that hypomethylatlon may be an important factor 
for the initiation and/or progression of pilocytic astrocyto- 
mas. 

We are most grateful to Vera Kaischeuer, Berlin, for help- 
ing us to establish the bisulfite treatment method in our 
laboratory. This work was supported by the Wllhelm 
Sander Stiftung. 
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ABSTRACT 



The present invention relates to a method of sequencing 
DNA, based on the detection of base incorporation by the 
release of pyrophosphate (PPi) and simultaneous enzymatic 
nucleotide degradation. 

17 Claims, 6 Drawing Sheets 
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METHOD OF SEQUENCING DNA BASED ON 
THE DETECTION OF THE RELEASE OF 
PYROPHOSPHATE AND ENZYMATIC 
NUCLEOTIDE DEGRADATION 

5 

BACKGROUND OF THE INVENTION 

This invention relates to a method of sequencing DNA, 
based on the detection of base incorporation by the release 
of pyrophosphate (PPi) and simultaneous enzymatic nucle- 
otide degradation. -^^ 

DNA sequencing is an essential tool in molecular genetic 
analysis. The ability to determine DNA nucleotide 
sequences has become increasingly important as efforts have 
commenced to determine the sequences of the large 
genomes of humans and other higher organisms. The two 
most commonly used methods for DNA sequencing are the 
enzymatic chain-termination method of Sanger and the 
chemical cleavage technique of Maxam and Gilbert. Both 
methods rely on gel electrophoresis to resolve, according to 
their size, DNA fragments produced from a larger DNA 
segment. Since the electrophoresis step as well as the 
subsequent detection of the separated DNA-fragments are 
cumbersome procedures, a great effort has been made to 
automate these steps. However, despite the fact that auto- 
mated electrophoresis units are commercially available, 
electrophoresis is not well suited for large-scale genome 
projects or clinical sequencing where relatively cost- 
effective units with high throughput are needed. Thus, the 
need for non-electrophoretic methods for sequencing is great 
and several alternative strategies have been described, such 
as scanning tunnel electron microscopy (Driscoll et al., 
1 990, Nature, 346, 294-296), sequencing by hybridization 
(Bains ct al., 1988, J. Thco. Biol. 135, 308-307) and single 
molecule detection (Jeff et al., 1989, Biomol. Struct. 
Dynamics, 7, 301-306), to overcome the disadvantages of 
electrophoresis. 

Techniques enabling the rapid detection of a single DNA 
base change are also important tools for genetic analysis. In 
many cases detection of a single base or a few bases would 
be a great help in genetic analysis since several genetic 
diseases and certain cancers are related to minor mutations. 
A mini-sequencing protocol based on a solid phase principle 
was described (Hultman, et al., 1988, Nucl. Acid. Res., 17, 
4937-4946; Syvanen et al., 1990, Genomics, 8, 684-692). 45 
The incorporation of a radiolabeled nucleotide was mea- 
sured and used for analysis of the three-allelic polymor- 
phism of the human apolipoprotein E gene. However, radio- 
active methods are not well suited for routine clinical 
applications and hence the development of a simple non- 5Q 
radioactive method for rapid DNA sequence analysis has 
also been of interest. 

Methods of sequencing based on the concept of detecting 
inorganic pyrophosphate (PPi) which is released during a 
polymerase reaction have been described (WO 93/23564 55 
and WO 89/09283). As each nucleotide is added to a 
growing nucleic acid strand during a polymerase reaction, a 
pyrophosphate molecule is released. It has been found that 
pyrophosphate released under these conditions can be 
detected enzymically e.g. by the generation of light in the 60 
luciferase-luciferin reaction. Such methods enable a base to 
be identified in a target position and DNA to be sequenced 
simply and rapidly whilst avoiding the need for electro- 
phoresis and the use of harmful radiolabels. 

However, the PPi-based sequencing methods mentioned 65 
above are not without drawbacks. The template must be 
washed thoroughly between each nucleotide addition to 
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remove all non-incorporated deoxynucleotides. This makes 
it difficult to sequence a template which is not bound to a 
solid support. In addition new enzymes must be added with 

each addition of deoxynucleotide. 

Thus, whilst PPi-based methods such as are described 
above do represent an improvement in ease and speed of 
operation, there is still a need for improved methods of 
sequencing which allow rapid detection and provision of 
sequence information and which are simple and quick to 
perform, lending themselves readily to automation. 

We now propose a novel modified PPi-based sequencing 
method in which these problems are addressed and which 
permits the sequencing reactions to be performed without 
intermediate washing steps, enabling the procedure to be 
carried out simply and rapidly, for example in a single 
microtitre plate. Advantageously, there is no need to immo- 
bilise the DNA. Conveniently, and as will be discussed in 
more detail below, the new method of the invention may also 
readily be adapted to permit the sequencing reactions to be 
continuously monitored in real-time, with a signal being 
generated and detected, as each nucleotide is incorporated. 

BRIEF SUMMARY OF THE INVENTION 

In one aspect, the present invention thus provides a 
method of identifying a base at a target position in a sample 
DNA sequence wherein an extension primer, which hybri- 
dises to the sample DNA immediately adjacent to the target 
position is provided and the sample DNA and extension 
primer are subjected to a polymerase reaction in the pres- 
ence of a deoxynucleotide or dideoxynucleotide whereby the 
deoxynucleotide or dideoxynucleotide will only become 
incorporated and release pyrophosphate (PPi) if it is comple- 
mentary to the base in the target position, any release of PPi 
being detected enzymically, different deoxynucleotides or 
dideoxynucleotides being added either to separate aliquots 
of sample-primer mixture or successively to the same 
sample -primer mixture and subjected to the polymerase 
reaction to indicate which deoxynucleotide or dideoxynucle- 
otide is incorporated, characterised in that, a nucleotide- 
degrading enzyme is included during the polymerase reac- 
tion step, such that unincorporated nucleotides are degraded. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a schematic representation of a new DNA 
sequencing method of the invention. The four different 
nucleotides are added stepwise to the template hybridised to 
a primer. The PPi released in the DNA polymerase catalysed 
reaction, is detected by the ATP sulfurylase and luciferase 
catalysed reactions. The height of the signal is proportional 
to the number -of bases which have been incorporated. The 
added nucleotides arc continuously degraded by a nucleotide 
degrading enzyme. After the first added nucleotide is 
degraded, the next nucleotide can be added. These steps are 
repeated in a cycle and the sequence of the template is 
deduced. 

FIG. 2 shows DNA sequencing on a 35-base long oligo- 
nucleotide template. About 2 pmol of the template/primer 
(E3PN/NUSPr) were incubated with 4 pmol (exo") Klenow 
and 0.2 U apyrase. The reaction was started by the addition 
of 0.4 nmol of each of the indicated deoxynucleotides and 
the PPi released was detected in real-time by the ELIDA. 
The DNA-sequence of the template is shown in the Figure. 
The experimental conditions arc as described in Example 1. 

FIG. 3 shows DNA sequencing on a 35-base-long oligo- 
nucleotide template. About 5 pmol of the template/primer 
(E3PN/NUSPT) were incubated with 8 pmol (exo") Klenow 
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and 0.2 U apyrase. The reaction was started by the addition 
of 0.4 nmol of the indicated deoxynucleotide and the PPi 
released was detected by the ELIDA. The DNA-sequence of 
the template is shown in the Figure. The experimental 
conditions were as described m Example 1. 5 

FIG. 4 shows DNA sequencing on a 35-base-long oligo- 
nucleotide template. About 5 pmol of the template/primer 
(PEBE25/RIT27) were incubated with 8 pmol (exo") Kle- 
now and 0.2 U apyrase. The reaction was started by the 
addition of 0.4 nmol of the indicated deoxynucleotide and 
the PPi released was detected by the ELIDA. The DNA- 
sequence of the template is shown in the Figure. The 
experimental conditions were as described in Example 1. 

FIG. 5 shows real-time DNA sequencing performed on a 
160-base-long single -stranded PGR product. About 5 pmol 
of the template/primer (NUSPF) were incubated with 8 
pmol (exo~) Klenow and 0.2 U apyrase. The reaction was 
started by the addition of 0.4 nmol of the indicated deoxy- 
nucleotide and the PPi released was detected by the ELIDA. 
The DNA-sequence after the primer is shown in the Figure. 
The experimental conditions were as described in Example 
1. 

FIG. 6 shows the sequencing method of the invention 
performed on a 130-base-long single-stranded PGR product 
hybridized to the sequencing primer as described in 
Example 2. About 2 pmol of the template/primer was used 
in the assay. The reaction was started by the addition of 0.6 
nmol of the indicated deoxynucleotide and the PPi released 
was detected by the described method. The DNA-sequence 
after the primer is indicated in the Figure. 

DETAILED DESCRIPIION OF THE 
INVENTION 

The term "nucleotide-degrading enzyme" as used herein 35 
includes all enzymes capable of non-specifically degrading 
nucleotides, including at least nucleoside triphosphates 
(NTPs), but optionally also di- and mono-phosphates, and 
any mixture or combination of such enzymes, provided that 
a nucleoside triphosphatase or other NTP degrading activity 40 
is present. Although nucleotide-degrading enzymes having a 
phosphatase activity may conveniently be used according to 
the invention, any enzyme having any nucleotide or nucleo- 
side degrading activity may be used, e.g. enzymes which 
cleave nucleotides at positions other than at the phosphate 45 
group, for example at the base or sugar residues. Thus, a 
nucleoside triphosphate degrading enzyme is essential for 
the invention. Nucleoside di- and/or mono-phosphate 
degrading enzymes are optional and may be used in com- 
bination with a nucleoside tri-phosphate degrading enzyme. 50 
Suitable such enzymes include most notably apyrase which 
is both a nucleoside diphosphatase and triphosphatase, catal- 
ysing the reactions N lP^NMP+2Pi and N IP^NDP+Pi 
(where NTP is a nucleoside triphosphate, NDP is a nucleo- 
side diphospate, NMP is a nucleotide monophosphate and Pi 55 
is phosphate). Apyrase may be obtained from Sigma Chemi- 
cal Company. Other suitable nucleotide triphosphate degrad- 
ing enzymes include Pig Pancreas nucleoside triphosphate 
diphosphohydrolase (Le Bel et al., 1980, J. Biol. Chem., 
255, 1227-1233). Further enzymes are described in the eo 
literature. 

Different combinations of nucleoside tri-, di- or mono- 
phosphatases may be used. Such enzymes are described in 
the literature and different enzymes may have different 
characteristics for deoxynucleotide degradation, eg. differ- 65 
ent Km, different efficiencies for a different nucleotides etc. 
Thus, different combinations of nucleotide degrading 
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enzymes may be used, to increase the efficiency of the 
nucleotide degradation step in any given system. For 
example, in some cases, there may be a problem with 
contamination with kinases which may convert any nucleo- 
side diphosphates remaining to nucleoside triphosphates, 
when a further nucleoside triphosphate is added. In such a 
case, it may be advantageous to include a nucleoside dis- 
phosphatase to degrade the nucleoside diphosphates. Advan- 
tageously all nucleotides may be degraded to nucleosides by 
the combined action of nucleoside tri-, di- and monophos- 
phatases. 

Generally speaking, the nucleotide-degrading enzyme is 
selected to have kinetic characteristics relative to the poly- 
merase such that nucleotides are first efficiently incorporated 
by the polymerase, and then any no n- incorporated nucle- 
otides are degraded. Thus, for example, if desired the k^ of 
the nucleotide-degrading enzyme may be higher than that of 
the polymerase such that nucleotides which arc not incor- 
porated by the polymerase are degraded. This allows the 
sequencing procedure to proceed without washing the tem- 
plate between successive nucleotide additions. A further 
advantage is that since washing steps are avoided, it is not 
necessary to add new enzymes eg. polymerase with each 
new nucleotide addition, thus improving the economy of the 
procedure. Thus, the nucleotide-degrading enzyme or 
enzymes are simply included in the polymerase reaction 
mix, and a sufficient time is allowed between each succes- 
sive nucleotide addition for degradation of substantially 
most of the unincorporated nucleotides. The amount of 
nucleotide-degrading enzyme to be used, and the length of 
time between nucleotide additions may readily be deter- 
mined for each particular system, depending on the reactants 
selected, reaction conditions etc. However, it has for 
example been found that the enzyme apyrase may conve- 
niently be used in amounts of 0.25 U/mL to 2 U/mL. 

As mentioned above, the nucleotide-degrading enzyme(s) 
may be included during the polymerase reaction step. This 
may be achieved simply by adding the enzyme(s) to the 
polymerase reaction mixture prior to, simultaneously with or 
after the polymerase reaction (ie. the chain extension or 
nucleotide incorporation) has taken place, e.g. prior to, 
simuhaneously with, or after, the polymerase and/or nucle- 
otides are added to the sample/primer. 

In one embodiment, the nucleotide-degradmg enzyme(s) 
may simply be included in solution in a reaction mix for the 
polymerase reaction, which may be initiated by addition of 
the polymerase or nucleotide(s). 

Alternatively, the nucleotide-degrading enzyme(s) may 
be immobilised on a solid support e.g. a particulate solid 
support (e.g. magnetic beads) or a filter, or dipstick etc. and 
it may be added to the polymerase reaction mixture at a 
convenient time. For example such immobilised enzyme(s) 
may be added after nucleotide incorporation (i.e. chain 
extension) has taken place, and then, when the incorporated 
nucleotides are hydrolysed, the immobilised enzyme may be 
removed from the reaction mixture (e.g. it may be with- 
drawn or captured, e.g. magnetically in the case of magnetic 
beads), before the next nucleotide is added. The procedure 
may then be repeated to sequence more bases. Such an 
arrangement has the advantage that more efficient nucleotide 
degradation may be achieved as it permits more nucleotide 
degrading enzyme to be added for a shorter period. This 
arrangement may also facilitate optimisation of the balance 
between the two competing reactions of DNA polymerisa- 
tion and nucleotide degradation. 

In a further embodiment, the immobilisation of the 
nucleotide-degrading enzyme may be combined with the use 
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of the 6iizyme(s) in solution. For example, a lower amount represents a departure from the approach reported in the 

may be included in the polymerase reaction mixture and, PPi-based sequencing procedures discussed in the literature 

when necessary, nucleotide-degrading activity may be above, in which the chain extension reaction is first per- 

boosted by adding immobHised enzyme as described above. ^^^^^^^ separately as a first reaction step, foUowed by a 

The term dideoxynucleotide as used herein includes all 5 separate "detection" reaction, in which the products of the 

2'-deoxynucleotides m which the 3'-hydroxyl group is extension reaction are subsequently subjected to the 

absent or modmed and thus, while able to be added to the i -4? • i -jr u j • i /uj + "\ 

. , P 1 1 . 11 luciterm-luciterase based signal generation ( detection ) 

primer in the presence ol the polymerase, is unable to enter ^. t^- « w « j ^ r j 

f ^ , ^ , , • I reactions. This real time procedure represents a preferred 
into a subsequent polymerisation reaction. , ^ , . . 

, , "11 i 1 1 1 embodiment 01 the invention. 

PPi can be uetermmea by many dinerent metnods and a 10 

number of enzymatic methods have been described in the To carry out this preferred embodiment of the method of 

literature (Reeves et al., (1969), Anal. Biochem., 28, the invention, the PPi-detection enzyme(s) are included in 

282-287; Guillory et al., (1971), Anal. Biochem., 39, the polymerase reaction step ic. in the chain extension 

170-180; Johnson et al., (1968), Anal. Biochem., 15, 273; reaction step. Thus the detection enzymes are added to the 

Cooketal.,(1978),Anal. Biochem. 91, 557-565; and Drake 15 reaction mix for the polymerase step prior to, simulta- 

et al., (1979), Anal. Biochem. 94, 117-120). neously w^th or during the polymerase reaction. In the case 

It is preferred to use luciferase and luciferin in combina- of an ELIDA detection reaction, the reaction mix for the 

tion to identify the release of pyrophosphate since the polymerase reaction may thus include at least nucleotide 

amount of light generated is substantially proportional to the (deoxy- or dideoxy), polymerase, luciferin, APS, ATP 

amount of pyrophosphate released which, in turn, is directly 20 suphurylase and luciferase together with a nucleotide- 

proportional to the amount of base incorporated. The amount degrading enzyme. The polymerase reaction may be initi- 

of Hght can readily be estimated by a suitable light sensitive ated by addition of the polymerase or, more preferably the 

device such as a luminometer. nucleotide, and preferably the detection enzymes are already 

Luciferin-luciferase reactions to detect the release of PPi present at the time the reaction is initiated, or they may be 

are well known in the art. In particular, a method for 25 added with the reagent that initiates the reaction, 

continuous monitoring of PPi release based on the enzy^ This latter embodimem of the present invention thus 

ATP sulphurylase and luciferase has been developed by ^^^^^^^ pp. ^^^^^^^ ^^^^^^^^ ^^^^ polymerase 

Nyren and Lundin (Anal. Biochem., 151, 504-509, 1985) ^^^^^^^ ^ ^^^^^ ^^^^^ sequencing reactions 

and termed ELIDA (Enzymatic Lummometnc Inorgamc ^av be continuously monitored in real-dme.Aprocedure for 

Pyrophosphate Detection Assay). Hie use of the ELIDA 30 detection of PPi release is thus enabled by the present 

method to detect PPi is preferred according to the present invention. The ELIDA reactions have been estimated to take 

invention The method may however be modified, for ^^^^ ^ seconds (Nyren and Lundin, supra). The 

example by the use ol a more thermostable luciterase ^^^^ ^^^^ conversion of PPi to AIP by ATP 

' 7 f^Ton^ sulphurylase, while the luciferase reaction is fast and has 

1170-1171) aiid/or AIP sulfurylase (Onda et al., 1996, 35 t^en estimated to take less than 0.2 seconds. Incorporation 

i^^^^^^x'^^ ! Biochemistry, 60:10, ^^^^^ polymerases have also been estimated by various 

1740-42). Tins method is based on the foUowmg reactions: ^^^^^^^ ^^^^ ^^^^^^ example, that in the case 

of Klenow polymerase, complete incorporation of one base 

PPi + APS "TT-t^P^i^Yl^l^*. ATP + S04^ 40 ^^^^ ^^^^ ^^^^ ^'^ seconds. Thus, the estimated total 

time for incorporation of one base and detection by ELIDA 

luciferase . is approximately 3 seconds. It will be seen therefore that 

ATP + luciferin + O2 AMP + PPi + ^^^^ ^^^^ reaction times are possible, enabhng real-time 

oxylucifcrin + CO2 + hv detection. The reaction times could further be decreased by 

(APS = adenosine 5'-phosphosuiphate) 45 ^sing a more thcrmostable luciferase. By using a nucleotide- 
degrading enzyme with a time in the order of seconds for 
degrading half the nucleotides present, an efficient degrada- 

The preferred detection enzymes involved in the PPi tion can be achieved in time frames from seconds to several 

detection reaction are thus ATP sulphurylase and luciferase. minutes. 

The method of the invention may be performed in two 50 Thus, the method of the present invention may be per- 

steps, as described for example in W093/23564 and W089/ formed in a single reaction step involving an up to 4-enzyme 

09283, firstly a polymerase reaction step ie. a primer exten- or more reaction mixture ie. a multi-enzyme mixture. It is 

sion step, wherein the nucleotide(s) are incorporated, fol- surprising that a beneficial and cooperative effect between 

lowedby a second detection step, wherein the release of PPi multiple interlinked enzyme reactions may take place 

is monitored or detected, to detect whether or not a nucle- 55 according to the invention and yield beneficial results, 

otide incorporation has taken place. Thus, after the poly- . , . . n . 

merase reaction has taken place, samples from the poly- ^ ^^^P^^^ sequencing/detection system may therefore be 

merase reaction mix may be removed and analysed by the ^^ased on the following reactions: 
ELIDA eg. by adding an aliquot of the sample to a reaction 

mixture containing the ELIDA enzymes and re act ants. 60 dNTP + (DNA)n (DNA)n+i + PPi 

However, as mentioned above, the method of the inven- ^^^p 

tion may readily be modified to enable the sequencing (ie. sulfuiylase 
base incorporation) reactions to be continuously monitored 

in real time. This may simply be achieved by performing the ATP -^r^-^—^-V Light 

chain extension and detection, or signal-generation, reac- 65 Apyrasc 

tions substantially simultaneously by including the "detec- dNTP ■ dNMP + 2Pi 

tion enzymes" in the chain extension reaction mixture. This 



PPi ■ — ATP 
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-continued 

Apyrase 
ATF AMP + 2Fi 

It will be noted that a nucleotide -degrading enzyme such 5 
as apyrase would also degrade the ATP not used in the 
luciferase reactions. Thus, all nucleotide triphosphates are 
degraded. 

Indeed, when PPi release according to the invention is 
detected by luciferase-based reactions e.g. ELIDA, this lo 
ATP-degrading activity may be an important advantage, 
particularly in "turning off" the light production by the 
luciferin/lucif erase reaction. This may also be of advantage, 
with a low "burn rate" of the luciferase enzyme. 

A potential problem which has previously been observed 15 
with PPi-based sequencing methods is that DATP, used in 
the sequencing (chain extension) reaction, interferes in the 
subsequent luciferase-based detection reaction by acting as 
a substrate for the luciferase enzyme. This may be reduced 
or avoided by using, in place of deoxy- or dideoxy adenosine 
triphosphate (AFP), a DATP or ddATP analogue which is 
capable of acting as a substrate for a polymerase but 
incapable of acting as a substrate for a said PPi-detection 
enzyme. 

The term "incapable of acting" includes also analogues 
which are poor substrates for the detection enzymes, or 
which are substantially incapable of acting as substrates, 
such that there is substantially no, negligible, or no signifi- 
cant interference in the PPi detection reaction. 

Thus, a further preferred feature of the invention is the use 
of a dATP or ddATP analogue which does not interfere in the 
enzymatic PPi detection reaction but which nonetheless may 
be normally incorporated into a growing DNA chain by a 
polymerase and can also be degraded by the nucleotide 
degrading enzymes. By "normally incorporated" is meant 
that the nucleotide is incorporated with normal, proper base 35 
pairing. In the preferred embodiment of the invention where 
luciferase is the PPi detection enzyme, the preferred ana- 
logues for use according to the invention are the [1-thio] 
triphosphate (or a-thiotriphosphate) analogues of deoxy or 
dideoxy ATP, preferably deoxyadenosine [1-thio] 40 
triphospate, or deoxyadenosine a-thiotriphosphate 
(dATPaS) as it is also known. dATPaS, along with the 
a-thio analogues of dCTP, dGTP and dTTP, may be pur- 
chased from New England Nuclear Labs. Experiments have 
shown that substituting DATP with dATPaS allows efficient 45 
incorporation by the polymerase with a low background 
signal due to the absence of an interaction between dATPaS 
and luciferase. The signal-to-noise ratio is increased by 
using a nucleotide analogue in place of dATP, which elimi- 
nates the background caused by the ability of dATP to 50 
function as a substrate for luciferase. In particular, an 
efficient incorporation with the polymerase may be achieved 
while the background signal due to the generation of light by 
the lucifer in -luciferase system resulting from dATP interfer- 
ence is substantially decreased. The dNTPaS analogues of 55 
the other nucleotides may also be used in place of all dNTPs. 

The sample DNA (ie. DNA template) may conveniently 
be single-stranded, and may either by immobilised on a solid 
support or in solution. The use of a nucleotide-degrading 
enzyme according to the present invention means that it is 60 
not necessary to immobilise the template DNA to facilitate 
washing, since a washing step is no longer required. By 
using thermostable enzymes, double -stranded DNA tem- 
plates might also be used. 

The sample DNA may be provided by any desired source 65 
of DNA, including for example PCR or other amplified 
fragments, inserts in vectors such as M13 or plasmids. 
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In order to repeat the method cyclically and thereby 
sequence the sample DNA and, also to aid separation of a 
single stranded sample DNA from its complementary strand, 
the sample DNA may optionally be immobilised or provided 
with means for attachment to a solid support. Moreover, the 
amount of sample DNA available may be small and it may 
therefore be desirable to amplify the sample DNA before 
carrying out the method according to the invention. 

The sample DNA may be amplified, and any method of 
amplification may be used, for example in vitro by PCR or 
Self Sustained Sequence Replication (3SR) or in vivo using 
a vector and, if desired, i vitro and in vivo amplification may 
be used in combination. Whichever method of amplification 
is used the procedure may be modified that the amplified 
DNA becomes immobilised or is provided with means for 
attachment to a solid support. For example, a PCR primer 
may be immobilised or be provided with means for attach- 
ment to a solid support. Also, a vector may comprise means 
for attachment to a solid support adjacent the site of insertion 
of the sample DNA such that the amplified sample DNA and 
the means for attachment may be excised together. 

Immobilisation of the amplified DNA may take place as 
part of PCR amplification itself, as where one or more 
primers are attached to a support, or alternatively one or 
more of the PCR primers may carry a functional group 
permitting subsequent immobilisation, eg. a biotin or thiol 
group. Immobilisation by the 5' end of a primer allows the 
strand of DNA emanating from that primer to be attached to 
a solid support and have its 3' end remote from the support 
and available for subsequent hybridisation with the exten- 
sion primer and chain extension by polymerase. 

The solid support may conveniently take the form of 
microtitre wells, which are advantageously in the conven- 
tional 8x12 format, or dipsticks which may be made of 
polystyrene activated to bind the primer DNA (K Aimer, 
Doctoral Theses, Royal Institute of Technology, Stockholm, 
Sweden, 1988). However, any solid support may conve- 
niently be used including any of the vast number described 
in the art, eg. for separation/immobilisation reactions or 
solid phase assays. Thus, the support may also comprise 
particles, fibres or capillaries made, for example, of agarose, 
cellulose, alginate, Teflon or polystyrene. Magnetic particles 
eg the superparamagnetic beads produced by Dynal AS 
(Oslo, Norway) also may be used as a support. 

The solid support may carry functional groups such as 
hydroxyl, carboxyl, aldehyde or amino groups, or other 
moieties such as avidin or streptavidin, for the attachment of 
primers. These may in general be provided by treating the 
support to provide a surface coating of a polymer carrying 
one of such functional groups, e.g. polyurethane together 
with a polyglycol to provide hydroxyl groups, or a cellulose 
derivative to provide hydroxyl groups, a polymer or copoly- 
mer of acrylic acid or methacrylic acid to provide carboxyl 
groups or an amino alkylated polymer to provide amino 
groups. U.S. Pat. No. 4654267 describes the introduction of 
many such surface coatings. 

Accumulation of reaction by-products may take place. 
This may readily be avoided by washing the sample after a 
certain number of reaction cycles e.g. 15-25. Washing may 
be facilitated by immobilising the sample on a solid surface. 

The assay technique is very simple and rapid, thus making 
it easy to automate by using a robot apparatus where a large 
number of samples may be rapidly analysed. Since the 
preferred detection and quantification is based on a lumino- 
metric reaction, this can be easily followed spectrophoto- 
metrically. The use of luminometers is well known in the art 
and described in the literature. 
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Tlie pyrophosphate detection method of the present inven- 
tion thus opens up the possibility for an automated approach 
for large-scale, non-elecrophoretic sequencing procedures, 
which allow for continuous measurement of the progress of 
the polymerisation reaction with time. The method of the 5 
invention also has the advantage that multiple samples may 
be handled in parallel. 

The target DNA may be cDNA synthcsiscd from RNA in 
the sample and the method of the invention is thus apphcable 
to diagnosis on the basis of characteristic RNA. Such -^q 
prehminary synthesis can be carried out by a preliminary 
treatment with a reverse transcriptase, conveniently in the 
same system of buffers and bases of subsequent PGR steps 
if used. Since the PGR procedure requires heating to effect 
strand separation, the reverse transcriptase will be inacti- 
vated in the first PGR cycle. When mRNA is the sample 
nucleic acid, it may be advantageous to submit the initial 
sample, e.g. a serum sample, to treatment with an immobi- 
lised polydT ohgonucleotide in order to retrieve all mRNA 
via the terminal polyA sequences thereof. Alternatively, a 20 
specific oligonucleotide sequence may be used to retrieve 
the RNA via a specific RNA sequence. The oligonucleotide 
can then serve as a primer for cDNA synthesis, as described 
in WO 89/0982. 

Advantageously, the extension primer is sufficiently large 25 
to provide appropriate hybridisation with the sequence 
immediately 5' of the target position, yet still reasonably 
short in order to avoid unnecessary chemical synthesis. It 
will be clear to persons skilled in the art that the size of the 
extension primer and the stability of hybridisation will be 30 
dependent to some degree on the ratio of A-T to C-G base 
pairings, since more hydrogen bonding is available in a C-G 
pairing. Also, the skilled person will consider the degree of 
homology between the extension primer to other parts of the 
amplified sequence and choose the degree of stringency 35 
accordingly. Guidance for such routine experimentation can 
be found in the literature, for example, Molecular Gloning: 
a laboratory manual by Sambrook, J., Fritsch E. F. and 
Maniatis, T. (1989). It may be advantageous to ensure that 
the sequencing primer hybridises at least one base inside 40 
from the 3' end of the template to eHminate blunt-ended 
DNA polymerase activity. If separate aliquots are used (ie. 
4 aliquots, one for each base), the extension primer is 
preferably added before the sample is divided into four 
aliquots although it may be added separately to each aliquot. 45 
It should be noted that the extension primer may be identical 
with the PGR primer but preferably it is different, to intro- 
duce a further element of specificity into the system. 

Alternatively, a primer with a phosphorylated 5'-end, 
containing a loop and annealing back on itself and the 3'-end 50 
of the single stranded template can be used. If the 3'-end of 
the template has the sequence region denoted T (template), 
the primer has the following sequence starting from the 
5' -end; P-L-P'-T', where P is primer specific (5 to 30 
nucleotides), L is loop (preferably 4 to 10 nucleotides), P' is 55 
complementary to P (preferably 5 and 30 nucleotides) and T' 
is complementary to the template sequence in the 3'-end (T) 
(at least 4 nucleotides). This primer can then be ligated to the 
single stranded template using 14 DNA ligase or a similar 
enzyme. This provides a covalent link between the template 60 
and the primer, thus avoiding the possibility that the hybri- 
dised primer is washed away during the protocol. 

The polymerase reaction in the presence of the extension 
primer and a deoxynucleotide is carried out using a poly- 
merase which wiU incorporate dideoxynucleotides, e.g. T7 65 
polymerase, Klenow or Sequenase Ver. 2.0 (USB U.S.A.). 
Any suitable polymerase may conveniently be used and 
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many are known in the art and reported in the Hterature. 
However, it is known that many polymerases have a proof- 
reading or error checking ability and that 3' ends available 
for chain extension are sometimes digested by one or more 
nucleotides. If such digestion occurs in the method accord- 
ing to the invention the level of background noise increases. 
In order to avoid this problem, a nonproof-reading 
polymerase, eg. exonuclease deficient (exo") Klenow poly- 
merase may be used. Otherwise it is desirable to add fluoride 
ions or nucleotide monophosphates which suppress 3' diges- 
tion by polymerase. The precise reaction conditions, con- 
centrations of reactants etc. may readily be determined for 
each system according to choice. However, it may be 
advantageous to use an excess of polymerase over primer/ 
template to ensure that all free 3' ends are extended. 

In the method of the invention there is a need for a DNA 
polymerase with high efficiency in each extension step due 
to the rapid increase of background signal which may take 
place if templates which are not fully extended accumulate. 
A high fidelity in each step is also desired, which can be 
achieved by using polymerases with exonuclease activity. 
However, this has the disadvantage mentioned above that 
primer degradation can be obtained. Although the exonu- 
clease activity of the Klenow polymerase is low, we have 
found that the 3' end of the primer was degraded with longer 
incubations in the absence of nucleotides. An induced-fit 
binding mechanism in the polymerisation step selects very 
efficiently for binding of the correct DNTP with a net 
contribution towards fidelity of 10^-10^. Exonuclease- 
deficient polymerases, such as (exo~) Klenow or Sequenase 
2.0, catalysed incorporation of a nucleotide which was only 
observed when the complementary dNTP was present, con- 
firming a high fidelity of these enzymes even ua the absence 
of proof-reading exonuclease activity. The main advantage 
of using (exo") Klenow DNA polymerase over Sequenase 
2.0 is its lower Km for nucleotides, allowing a high rate of 
nucleotide incorporation even at low nucleotide concentra- 
tions. It is also possible to replace all dNTPs with nucleotide 
analogues or non-natural nucleotides such as dNTPaS, and 
such analogues may be preferable for use with a DNA 
polymerase having exonuclease activity. 

In certain circumstances, e.g. with longer sample 
templates, it may be advantageous to use a polymerase 
which has a lower k^ for incorporation of the correct 
(matched) nucleotide, than for the incorrect (mismatched) 
nucleotide. This may improve the accuracy and efficiency of 
the method. Suitable such polymerase enzymes include the 
a-polymerase of Drosophila. 

In many diagnostic applications, for example genetic 
testing for carriers of inherited disease, the sample will 
contain heterozygous material, that is half the DNA will 
have one nucleotide at the target position and the other half 
will have another nucleotide. Thus if four aliquots are used 
in an embodiment according to the invention, two will show 
a negative signal and two will show half the positive signal. 
It will be seen therefore that it is desirable to quantitatively 
determine the amount of signal detected in each sample. 
Also, it will be appreciated that if two or more of the same 
base are adjacent the 3'-end of the primer a larger signal will 
be produced. In the case of a homozygous sample it will be 
clear that there wiU be three negative and one positive signal 
when the sample is in four aliquots. 

Further to enhance accuracy of the method, bidirectional 
sequencing ie. sequencing of both strands of a double- 
stranded template may be performed. This may be advan- 
tageous e.g. in the sequencing of heterozygous material. 
Conveniently, this may be achieved by immobiHsing the 
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double -stranded sample template by one strand, e.g. on sample DNA is subjected to amplification; the amplified 

particles or in a microtitre well, eluting the second strand DNA is optionally immobilised and then subjected to strand 

and subjecting both strands separately to a sequencing separation, one strand eg. the optionally non-immobilised or 

reaction by the method of the invention. immobilised strand being removed, and an extension primer 

In carrying out the method of the invention, any possible 5 is provided, which primer hybridises to the sample DNA 

contamination of the reagents e.g. the NTP solutions, by PPi immediately adjacent that portion of the DNA to be 

is undesirable and may readily^be avoided by including a sequenced; the single stranded DNA is then subjected to a 

pyrophosphatase, preferably in low amounts, in the reagent polymerase reaction in the presence of a first 

solutions. Indeed, it is desirable to avoid contamination of deoxynucleotide, and the extent of pyrophosphate release is 

any sort and the use of high purity or carefully purified lo determined, non-incorporated nucleotides being degraded 

reagents is preferred, e.g. to avoid contamination by kinases. by the nucleotide-degrading enzyme, and the reaction being 

Reaction efficiency may be improved by inchiding Mg^^ ^P^^^ed by successive addition of a second third and fourth 

ions m the reagent (NTP and/or polymerase) solutions. deoxynucleotide until a positive release of pyrophosphate 

. . J 1 1 ; 1 1. mdicates incorporation of a particular deoxynucleotide mto 

It will be appreciated that when the target base immedi- ^^.u • t~ j • *j 

.1 r .1- • 1- J 1 i_ .1- . J 15 the primer, whereupon the procedure is repeated to extend 

ately 3'- of the primer has an identical base 3'-thereto, and j^^^ ^ ^ determine the base 

he polymensation is effected with a deoxynucleotide (rather ^^^^ ^ inunediately 3'- of the extended primer at each 
than a dideoxy nucleotide) the extension reaction will add 

stase 

two bases at the same time and indeed any sequence of ^ ' 

successive identical bases in the sample will lead to simul- , ^ alternative format for the analysis is to use an array 

taneous incorporation of corresponding bases into the ^^^"^^t wherem samples are distributed over a surface, tor 

primer. However, the amount of pyrophosphate liberated example a microfabricated chip, and thereby an ordered set 

will clearly be proportional to the number of incorporated ^^"^P^^^ "^^y immobdized m a 2-dimensional format, 

bases so that there is no difficulty in detecting such repeti- Many samples can thereby be analysed m parallel. Using the 

^•^jjjj. method of the invention, many immobilized templates may 

c,. ,1 .111 -11 1 .1 be analysed in this way by allowing the solution containing 

Since the primer is extended by a single base by the i .-j ^ ^ ^ j 

, J ill / f j .'i the enzymes and one nucleotide to flow over the surface and 

procedure described above (or a sequence oi identical j . .1 • 1 j j r 1 1 rr-i • 

V ^ J J . . ,1 ,1 then detecting the signal produced for each sample. This 

bases), the extended primer can serve in exactly the same , ^ , . 1 ai. • 1 ^ a-c 

/ ^ J J.J. • .1 .1 • .1 procedure can then be repeated. Alternatively, several dii- 

way in a repeated procedure to determine the next base in the K 1 1 , . . t . 

xi 1 1 1.1 J ferent oligonucleotides complementary to the template may 

sequence, thus permitting the whole sample to be sequenced, 1 j- . -i . j .1 r r n ji 1 1 j- r 

^ ^ ^ , -^^ be distributed over the suriace followed by hj^bndization 01 

As mentioned above, m the method of the mvention, the template. Incorporation of deoxynucleotides or dideoxv- 

different deoxy- or dideoxynucleotides may be added to nucleotides may be monitored for each oligonucleotide by 

separate aliquots of sample-primer mixture or successively ^-g^^^ produced using the various oligonucleotides as 

to the same sample-primer mixture. Iliis covers the situa- ^^^^^^ combining the signals from different areas of the 

tions where both mdividual and multiple target DNA 35 surface, sequence-based analyses may be performed by four 

samples are used in a given reaction, which sample DNAs ^y^^^^ polymerase reactions using the various dideoxv- 

may be the same or different. Thus, for example, as will be nucleotides 

discussed in more detail below, in certain embodiments of , . , . . , , • 

.1 • , ... . • /• Iwo-stage PCR (usine nested primers), as described in 

the invention, there may be one reaction in one container, (in \. . ^^j^r^K,^^^^/ 1 1 

1 • . .rMVTA o^Jr co-pending application W(_)y(.)/11369, may be used to 

the sense of one sample DNA, le. one target DNA sequence, 40 1 ^ , f V • • 1 1 1 • 1 

1 . . J j\ iT • .1- t- J- 4. j'£c 4. enhance the signal to noise ratio and thereby increase the 

being extended) whereas in other embodiments dilierent . a .1 j j- . .1 • 1 

. 1 u- u * • sensitivity of the method according to the invention. By such 

primer-sample combmations may be present in the same 1.. ^-n • 1 t^xta 

^. 4. tf 1*- prehminary amplification, the concentration 01 target DNA 

reaction chamber, but kept separate by e.g. area-selective ^ , / ^ , . ' , a 1 • f 1 

immobilisation ^ ^ ^ ^ greatly increased with respect to other DNA which may be 

_ ! . _ present in the sample and a second-stage amplification with 

Tlie present mvention provides two principal methods of 45 j^^^j -^^^ ^ ^^^^^^^ ^^^^ 

sequencing immobihsed DNA. A. 1 he invention provides a j^j^^ sigDificantly enhances the signal due to the 

first method of sequencmg sample DNA wherein the sample DNArelative to the 'backgrotmd noise'. 

DNA IS subjected to amplification; the amplmed DNA is .1 ^ . . t^^t, • 

optionally immobilised and then subjected to strand Regard ess of whether one-stage or two stage PCR is 

separation, one strand eg. the optionally non-immobilised or 50 Performed, the efficiency of the PCR is not critical since the 

immobilised strand being removed (ie. either strand may be invention relies on the distmct difference different from the 

sequenced), and an extension primer is provided, which aliquots However, as nientioned above, it is pre^^^^ 

primer hybridises to the sample DNA immediately adjacent ^ ''-"'''^^ quahtative PCR step e.g. by the DIANA method 

that portion of the DNA to be sequenced; each of four ^^^^^.f Ampliiied Nucleic Acids) as 

aliquots of the single stranded DNA is then subjected to a 55 described m WO90m369 as a check for the presence or 

polymerase reaction in the presence of a deoxynucleotide, ^^^^^^^ amphfied DNA. 

eachaliquotusingadififerentdeox>wcleotide whereby only Any suitable polymerase may be used, although it is 
the deoxynucleotide complementary to the base in the target preferred to use a thermophilic enzyme such as Taq poly- 
position becomes incorporated; pyrophosphate released by merase to permit the repeated temperature cychng without 
base incorporation being identified. After identification of 60 having to add further polymerase, e.g. Klenow fragment, in 
the incorporated nucleotide a nucleotide degrading en/yme each cycle of PCR. 

is added. Upon separating the nucleotide degrading enzyme PCR has been discussed above as a preferred method of 

from the different aliquots, for example if it is immobilised initiafiy amplifying target DNA although the skilled person 

oil magnetic beads, the four aliquots can be used in a new will appreciate that other methods may be used instead of in 

cycle of nucleotide additions. This procedure can then be 65 combination with PCR. A recent development in amplifica- 

continuously repeated. B. The invention also provides a tion techniques which does not require temperature cycling 

second method of sequencing sample DNA wherein the or use of a thermostable polymerase is Self Sustained 
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Sequence Replication (3SR). 3SR is modelled on retroviral polymorphic gene fragment. This means that the method can 

replication and may be used for amplification (see for be used to screen for rare point mutations responsible for 

example Gingeras,T.R.etalPNAS (USA) 87:1874-1878 both acquired and inherited diseases, identify DNA 

and Gingeras,T.R.etal PGR Methods and Applications Vol. polymorphisms, and even differentiate between drug- 

1, pp 25-33). 5 resistant and dmg-sensitive strains of viruses or bacteria 

As indicated above, the method can be applied to iden- without the need for centrifugations, fihrations, extractions 

tifying the release of pyrophosphate when dideoxynucle- or electrophoresis. The simplicity of the method renders it 

otidc residues arc incorporated into the end of a DNA chain. suitable for many medical (routine analysis in a wide range 

W093/23562 relates to a method of identification of the inherited disorders) and commercial applications. 

base in a single target position in a DNA sequence (mini- m tu. • * i i* * ju i i i 

• \ u -1 T^xTA • u- . 1 . The positive experimental results presented below clearly 

sequencing) wherein sample DJNA is subiected to amplm- . ^i. j i-.i. • ^- • i- .i ^ i- 

?. ^.^^ ..n , • • U T t u SHOW tfac mctfaod of tfac mveutionis appucaWe to au ou-une 

cation; the amplified DNA is immobilised and then sub- ^ x- i ^ i_ x- txxt* • i_ 

. ^ j \ , J ,1 1 -I - 1 . J automatic non-electrophoretic DNA sequencmg approach, 

lected to strand separation, the non-immobilised strand ..i. . . . r • i j i .-j 

i . « uuui r . with step-wise mcorporation of smgle deoxynucleotides. 

being removed and an extension primer, which hybridises to i-n ^ • u • i . j j txxt* j 

r . J ^-T^, . f . . , , ,1 , , After amplification to yield single-stranded DNA and 

the immobilised DNA immediately adiacent lo the target i- r .i • .i . i . / • r 

• 1 rr 1- . r .1 • -^^ annealing or the primer, the tempiate/primer-iragment is 

position, is provided; each or tour aliquots or the iromobi- , . .11 r ivtt-t. • 1 - • 01 

1 . J . 1 ^ J J • 1 • . J X 1 used in a repeated cycle ol uNlr incubations. Samples are 

lised single stranded DNA IS then subiected to a polymerase 1 ^ 1 • *u i-t tt^ a a *u ■ 4.- 

continuously monitored in the bLlDA. As the synthesis 01 

reaction m the presence of a dideoxynucleotide, each aliquot t^^. . , r- 1.1. 

^ 1. i 1 \i DNA is accompanied by release 01 inorganic pyrophosphate 

using a difiterent dideoxynucleotide whereby only the . ^ , . ^ * * f 

J.J =^ 1 .-J 1 . X xi f_ • xf. 1 . (PPi) in an amount equal to the amount 01 nucleotide 

dideoxynucleotide complementary to the base m the target .j i .i_....^a 1 j 1 i. 

. ^ . J r 1- . i~ incorporated, signals in the LLID A are observed only when 

position becomes incorporated: the four ahquots are then ^ / 1 . j . .1 1 -i-. r 

^ J ^ X • • 11. jriir complementary bases are incorporated. Due to the ability or 

subiected to extension m the presence of all four .11.1. • t^t^- 1 • -i / . 

J 1 1 1 1 • 11. ..1 T-ixTA 1- 1 the method to determine rPi quantitatively, it IS possible to 

deoxvnucleotides, whereby in each aliquot the DNA which • , • r • 1 1 -r . 

. " , . J vi- .1 j-j 1 \-j • . J J . distinguish incorporation ot a smgle base from two or 

has not reacted with the dideoxynucleotide is extended to 1 • 1^ • x- taxta j. 

^ J 11 . J JT^TVTA 1 -1 .1 j j 1-1 1 JT-MVTA several simijltaneous incorporations. Siucc uic DNA teffi" 

torm double stranded DNA while the dideoxy-blocked DNA 1 . • 1 1 w • j i • 1 1 . • w 
1 . J JT^XTA r 11 ji J .-^ ^ plate is preferably obtamcd by PGR, it is relatively straight 

remains as single stranded UNA; tollowed bv identincation ^ • . ^t^kta j j i. 

f J ui . 1 ] 1 / • 1 .'11 T^xT A . forward to mcrease the amoimt of DNA needed for such an 
or the double stranded and/or single stranded DNA to 

indicate which dideoxynucleotide was incorporated and assay. 

hence which base was present in the target position. Glearly, ^ mentioned above our results open the possibihty for a 
the release of pyrophosphate in the chain terminating 3. novel approach for large-scale non-electrophoretic DNA 

dideoxynucleotide reaction will indicate which base was sequencing, which aUows for continuous determination of 

incorporated but the relatively large amount of pyrophos- ^he progress of the polymerisation reaction with time. For 

phate released in the subsequent deoxynucleotide primer success of such an approach there is a need for high 

extension reactions (so-called chase reactions) gives a much efficiency of the DNA polymerase due to the rapid mcrease 

larger signal and is thus more sensitive. background signal if templates accumulate which are not 

It wfil usually be desirable to mn a control with no ^^^^f' ^h^ new approach has several advantages as 

dideoxynucleotides and a 'zero control' containing a mixture ^ompared to standard sequencmg methods. Firstly, the 

of all four dideoxynucleotides. "^^^f^ If ^^^^^aWe for handlmg of multiple samples m 

Ti^i^no o-ic/:-) J *u . ^j vi i ' parallel. Secondly, relatively cost-effective instruments Can 

W093/23562 defines the term dideoxynucleotide as f . . t jjv ^1 ^1 j -j ^1 r 

. . J. ^, . . 1 ^, 1 1 1 1-1 . - .1 be envisioned. In addition, the method avoids the use or 

including 3 -protected 2 -deoxynucleotides which act in the 40 1 , 1 • 1 .1 1 .1 1 1 • r 1 j 

i_ r.i 1. . . TT electrophoresis and thereby the loading 01 samples and 

same way by preventmg further chain extension. However, casfin of els ^ f 

if the 3' protecting group is removable, for example by ° ^' 

hydrolysis, then chain extension (by a single base) may be ^ ^^^^^^ advantage of the method of the present inven- 

followed by unblocking at the 3' position, leaving the ^ that it may be used to resolve sequences which cause 
extended chain ready for a further extension reaction. In this 45 compressions in the gel-electrophoretic step in standard 

way, chain extension can proceed one position at a time Sanger sequencmg protocols. 

without the comphcafion which arises with a sequence of The method of the invention may also find appKcabihty in 

identical bases, as discussed above. Thus, the methods A and other methods of sequencing. For example, a number of 

B referred to above can be modified whereby the base added iterative sequencing methods, advantageously permitting 
at each stage is a 3'-protected 2'-deoxynucleotide and after 50 sequencing of double-stranded targets, based on hgation of 

the base has been added (and the light emission detected), probes or adaptors and subsequent cleavage have been 

the 3' -blocking group is removed to permit a further described (see e.g. U.S. Pat. No. 5,599,675 and Jones, 

3'-protected - 2' deoxynucleotide to be added. Suitable BioTechniques 22: 938-946, 1997). Such methods generally 

protecting groups include acyl groups such as alkanol involve ligating a double stranded probe (or adaptor) con- 
groups e.g. acetyl or indeed any hydroxyl protecting groups 55 taining a Class IIS nuclease recognition site to a double 

known in the art, for example as described in Protective stranded target (sample) DNA and cleaving the probe/ 

Groups in Organic Ghemistry, JFW McOnie, Plenum Press, adaptor-target complex at a site within the target DNA, one 

1973. or more nucleotides from the ligation site, leaving a short- 

The invention, in the above embodiment, provides a ened target DNA. The Ugation and cleavage cycle is then 
simple and rapid method for detection of single base 60 repeated. Sequence information is obtained by identifying 

changes. In one format it successfully combines two tech- one or more nucleotides at the terminus of the target DNA. 

niques: solid-phase technology (DNA bound to magnetic The identification of the terminal nucleotide(s) may be 

beads) and an Enzymic Luminometric Detection Assay achieved by chain extension using the method of the present 

(ELIDA). The method can be used to both identify and invention. 

quantitate selectively amplified DNA fragments. It can also 65 Further to permit sequencing of a double stranded DNA, 

be used for detection of single base substitutions and for the method of the invention may be used in a sequencing 

estimation of the heterozygosity index for an amplified protocol based on strand displacement, e.g. by the introduc- 
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tioii of nicks, for example as described by Fu et al., in extension efficiencies may be strongly decreased by substi- 

Nucleic Acids Research 1997, 25(3): 677-679. In such a tuting the a-thiotriphosphate analog for the next correct 

method the sample DNA may be modified by ligating a natural deoxynucleotide after the 3'-mismatch termini. By 

double-stranded probe or adaptor sequence which serves to performing the assay in the presence of a nucleotide degrad- 

introduce a nick e.g. by containing a non- or mono- 5 ing enzyme. It is easier to distmguish between a match and 

phosphorylated or dideoxy nucleotide. Use of a strand- a mismatch of the type that are easy to extend, such as A: T, 

displacing polymerase permits a sequencing reaction to take ^ q and C:T. 

place by extending the 3' end of probe/adaptor at the nick, invention also comprises kits for use in methods of 

nucleotide mcorporation bemg detected accordmg to the invention which will normally include at least the 

method of the present mvention. 10 following components: 

Advantageously, the method according to the present (a) a test specific primer which hybridises to sample DNA so 

invention may be combined with the method taught in that the target position is directly adjacent to the 3' end of 

W093/23563 which uses PGR to introduce loop structures ^j^e primer; 

which provide a permanently attached 3' primer at the 3' q^^ ^ polymerase; 

terminal of a DNA strand of interest. For example, in such 15 (c) detection enzyme means for identifying pyrophosphate 

a modified method, the extension primer is introduced as release 

part of the 3'-terminal loop structure onto a target sequence ^ nucleotide -degrading enzyme; 

of one strand of double stranded DNA which contains the (g) deoxynucleotides, or optionally deoxynucleotide 

target position, said target sequence having a region A at the analogues, optionally including, in place of dATP, a dATP 

3'-terminus thereof and there being optionally a DNA region 20 analogue which is capable of acting as a substrate for a 

B which extends 3' from region A, whereby said double- polymerase but incapable of acting as a substrate for a 

stranded DNA is subjected to polymerase chain reaction gr^i^J PPi-detection enzyme; and 

(PGR) amplification using a first primer hybridising to the (f) optionally dideoxynucleotides, or optionally dideoxv- 

3'-terminus of the sequence complementary to the target nucleotide analogues, optionally ddATP being replaced 

sequence, which first primer is immobilised or provided 25 by a ddATP analogue which is capable of acting as a 

with means for aUachment to a solid support, and a second substrate for a polymerase but incapable of acting as a 

primer having a 3'-terminal sequence which hybridises to at substrate for a said PPi-detection enzyme, 

least a portion of A and/or B of the target sequence while if the kit is for use with initial PGR amplification then it 

having at its 5'-end a sequence substantially identical to A, ^ill also normally include at least the following compo- 

said amplification producing double -stranded target DNA 30 nents: 

having at the 3'-end of the target sequence, in the following (i) ^ pair of primers for PGR, at least one primer having 

order, the region A, a region capable of forming a loop and ^^.^^^ permitting immobilisation of said primer; 

a sequence A complementary to sequence A, whereafter the (^[^ ^ polymerase which is preferably heat stable, for 

amplified double-stranded DNA is subjected in immobilised example Taql polymerase; 

form to strand separation whereby the non-immobilised 35 buffers for the PGR reaction* and 

target strand is liberated and region A is permitted or caused ^[y^ deoxynucleo tides 

to hybridise to region A, thereby forming said loop. Ihe 3' jhe invention will now be described by way of a non- 
end of region A hybridises immediately adjacent the target limiting Example with reference to the drawings, 
position. The dideoxy and/or extension reactions use the 

hybridised portion as a primer. 40 EXAMPLE 1 

The method of the invention may also be used for MATERIALS AND METHODS 
real-time detection of known single-base changes. This 

concept relies on the measurement of the ditt'erence in Synthesis and purification of oligonucleotides 

primer extension eflSciency by a DNA polymerase of a The oligonucleotides PEBE25 SEQ ID NO:l (35-mer: 

matched over a mismatched 3' terminal. The rate of the DNA 45 5'-GGAAGGTGGGGAGAGAGAACATAGGAGGGGGA 

polymerase catalyzed primer extension is measured by the AGG-3'), RTT 27 SEQ ID NO: 2 (23-mer: 

ELIDA as described previously. The PPi formed in the 5'-GCTTGGGGCTGGTATGTTGTGTG-3'), E3PN SEQ ID 

polymerization reaction is converted to ATP by AFP sulfa- NO:3 (35-mer: 5'-GCTGGAATTCGTCAG ACTGG 

rylase and the ATP production is continuously monitored by CCGTGGTTrTACAAC-3'), NUSPT SEQ ID N0:4 (17- 

the firefly hiciferase. In the single-base detection assay, 50 mer: 5'-CtTAAAACCtACGGCCA-GT-3'), RIT 203 SEQ ID 

single-stranded DNA fragments arc used as template. Two NO:5 (51-mcr: 5'-AGCTTGGGTTGGAGGAGATCTTCC 

detection primers differing with one base at the 3'-end are GGGTTACGGCGGAAGATCTCCTCGAGG-3), RIT 204 

designed; one precisely complementary to the non-mutated SEQ ID N0:6 (51-mer: 5'-AGCTCC-TCGAGGAGATCT 

DNA-sequence and the other precisely complementary to TCCGCCGTAACCCGGAAGArCTCCTCGAACCCA-3'), 

the mutated DNA-sequence. The primers are hybridized 55 ROMO 205S SEQ ID NO:7 5'-CGAGGAGATCTTCCG 

with the 3'-termim over the base of interest and the primer GGTTACGGCG-3), ROMO 205B SEQ ID N0:8 (25-mer: 

extension rates are, after incubation with DNA polymerase 5'-biotin-CGAGGAGATCTTCCGGGTTACGGCG-3') RIT 

and deoxynucleotides, measured with the ELIDA. If the 28, RIT 29, and USP (Hultman et aL, 1990, Nucleic Acids 

detection primer exactly matches to the template a high Research, 18, 5107-5112) were synthesised by phosphora- 

extension rate will be observed. In contrast, if the 3'-end of 60 midite chemistry on an automated DNA synthesis apparatus 

the detection primer does not exactly match to the template (Gene Assembler Plus, Pharmacia Biotech, Uppsala, 

(mismatch) the primer extension rate will be much lower. Sweden). Purification was performed on a fast protein liquid 

The difference in primer extension efficiency by the DNA chromatography pepRPC 5/5 column (Pharmacia Biotech), 

polymerase of a matched over a mismatched 3'-terminal can . 1.^ • 

then be used for single-base discrimmation. TTius, the pres- 65 1° Amphfication and Template Preparation 

ence of the mutated DNA sequence can be distinguished PGR reactions were performed on the multilinker of 

over the non-mutated sequence. The relative mismatch plasmid pRT 28 with 7.5 pmol of general primers, RIT 28 
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and RIT 29 (biotinylated), according to Hultman et al. nucleoside 5'-diphosphatase; EC 3.6.1.5) (Sigma Chemical 

(supra). ITie biotinylated PCR products were immobilised Co.), purified luciferase (Sigjna Chemical Co.) in an amount 

onto strep tavidin-coated super paramagnetic beads Dyna- giving a response of 200 mV for 0.1 juM ATP. One to five 

bead'^^ M280-Streptavidin, or M450-Streptavidin (Dynal, pmol of the DNA-fragment, and 3 to 15 pmol DNA poly- 

A. S., Oslo, Norway). Single -stranded DNA was obtained by 5 merase were added to ttie solution described above. The 

removing the supernatant after incubation of the immobil- sequencing reaction was started by adding 0.2-1.0 nmol of 

iscd PCR product in 0.10 M NaOH for 5 minutes. Washing one of the dcoxynuclcotidcs (Pharmacia Biotech). The rcac- 

of the immobilised single -stranded DNA and hybridization tion was carried out at room temperature, 
to sequencing primers was carried out as described earlier 

(Nyren et al., 1993, Anal. Biochem. 208, 171-175). lo Conventional DNA Sequencing 

^ . ^. f ,u jj • • \z ^ T^niT^ooTTn J The sequencing data obtained from the new DNA 

Construction of the Hairpm Vector DRIT 28HP and . ^ r, ii 

n r T^r^^ A ^ ^ i . sequencing were confirmed by semiautomated solid-phase 

Preparation or PCR Amplmed lemplate ^ . ^ . ^- i i n j ^ • ^ ztt i1 

^ r r sequencing using radioactive labelled terminators (Hultman 

The oligonucleotides RIT 203, and RIT 204 were hybrid- ^ et al., 1991, Bio Techniques, 10, 84-93). The produced 

ized and ligated to Hindu (Pharmacia Biotech) pre -restricted Sanger fragment, from the loop -structured PCR product 

plasmid PRIT 28 (the obtained plasmid was named pRIT were restricted by Bgl 11 restriction endonuclease prior to gel 

28HP). PCR reaction was performed on the multilinker of loading. 

plasmid pRIT 28HP with 7.5 pmol of primer pairs, RIT 

29/ROMO 205S or RIT 27/ROMO 205B, 200 DNTP, 20 RESULTS 

mM Tris-HCl (pH 8.7), 2 mM MgCL, 0.1% Tween 20, and n • • i v ,u t-^xta c *>r a 

1 UK 1•^T■ TAVTA fc* "2? ? Principle ot the DNA Sequencing Method 

1 unit AmplitTaq DNA polymerase making up a final ^ n & 

volume of 50 //I. The temperature profile included a 15 The principle of the new sequencing method is illustrated 

second denaturation step at 95° C. and a 90 second in FIG. 1. A specific DNA-fragment of interest (sequencing 

hybridization/extension step at 72° C. Tliese steps were primer hybridized to a single -stranded DNA lemplate, or 

repeated 35 times with a GeneAmp PCR System, 9600 " self-primed single-stranded product) is incubated with DNA 

(Perkin, Elmer, Emeryville, USA). The immobilised (as polymerase, ATP sulfurylase, luciferase and a nucleotide 

described above) single-stranded DNA obtained from the degrading enzyme, and a repeated cycle of nucleotide iucu- 

RIT 29/ROMO 205S amplified reaction or the non- bation is performed, llie synthesis of DNA is accompanied 

biotinylated single -stranded DNA fragment from the RIT by release of PPi equal in molarity to that of the incorporated 

27/ROMO 205B amplified reaction, was allowed to hybrid- nucleotide. Thereby, real-time signals are obtained by the 

ize at 65° C. for 5 minutes in 20 mM Tris-HCl (pH 7.5), 8 enzymatic inorganic pyrophosphate detection assay 

mM MgCl^ to make a self-priming loop structure. (ET JDA) only when complementary bases are incorporated. 

In the ELIDAthc produced PPi is converted to ATP by ATP 

DNA Sequencing sulfurylase and the amount of ATP is then determined by the 

Tlie ohgonucleofides E3PN, PEBE25, and the above luciferase assay (FIG. 1). As added nucleotides a 

described PCR products were used as templates for DNA ^ ^^F""^^^^^^ L de^admg enzyme a new 

sequencing. The oligonucleotides E3PN, PEBE25, and nucleotide can be added after a specific time-interval From 

single-stranded RIT 28/RIT 29 amplified PCR product were ^h^ ^LIDA resuks the sequence after the primer is deduced 

hybridized to the primers NUSPl, Rll 27, and NUSPF, 40 sequencmg method of the mvention is named 

respectively The hybridized DNA-fragments, or the self- pyrosequencing . 

incubated vdth euher a modi- Optimization of the Method 
fied T7 DNA polymerase (Sequenase 2.0; U.S. Biochemical, 

Cleveland, Ohio, USA), or exonuclease deficient (exo") Several different parameters of the new DNA sequencing 

I<Qenow DNA polymerase (Amersham, UK). The sequenc- 45 approach were optimised in a model system using a syn- 

ing procedure was carried out by stepwise elongation of the thetic DNA template. As the method is based on utilization, 

primer strand upon sequential addition of the different of added deoxynucleotides by the DNA polymerase detec- 

deoxynucleoside triphosphates (Pharmacia Biotech), and tion of released PPi by a coupled enzymatic system and 

simultaneous degradation of nucleotides by apyrase. The continuous degradation of nucleotides, the concentration of 

PPi released due to nucleofide incorporation was detected by so the different components used in the assay should be care- 

the ELIDA. The produced ATP and the non-incorporated fully balanced. 

deoxynucleotide were degraded in real-time by apyrase. The The signal-extent as a function of the numbers of correct 

luminescence was measured using an LKB 1250 luminom- deoxynucleotides added is shown in FIG. 2. The reaction 

eter connected to a potentiometric recorder. The luminom- was started by addition of the three first correct bases (dCTP, 

eter was calibrated to give a response of 10 mV for the 55 dTTP and dGTP) and the trace show both the release of PPi 

internal light standard. The luminescence output was cali- (converted to ATP by the ATP sulfurylase) during the incor- 

bratedby the addition of a known amount of ATP or PPi. The poration of the bases, and the subsequent degradation of 

standard assay volume was 0.2 ml and contained the fol- ATP. The incorporation of three residues was noted. After a 

lowing components: 0.1 M Tris-acetate (pH 7.75), 2 mM short time-lag (the apyrase reaction was allowed to proceed 

EDTA, 10 mM magnesium acetate, 0.1% bovine serum 60 about 2 minutes), dATPuS was added; a signal correspond- 

albumin, 1 mM dithiothreitol, 2 //M adenosine ing to incorporation of one residue was observed. Thereafter, 

5'-phosphosultate (APS), 0.4 mg/ml polyvinylpyrrohdone the two next correct deoxynucleotides (dCTP and dGTP) 

(360 000), 100 //g/'ml D-lOO //g/ml D-luciferin (Bio Therma, were added. This time the incorporation of two residues was 

Dalaro, Sweden), 4 wg/ml L-luciferin (Bio Therma, Dalaro, detected. The results illustrated in FIG. 2 show that tlie DNA 

Sweden), 120-240 mU/ml ATP sulfurylase (ATP: sulfate 65 sequencing approach functions; the added deoxynucleotides 

adenylyl transferase; EC 2.7.7.4) (Sigma Chemical Co.), were degraded by apyrase between each addition, the 

100-400 mU apyrase (nucleoside 5 '-triphosphatase and observed signals were proportional to the amount of nucle- 
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otide incorporated, and no release of PPi was observed if a slow degration of deoxynucleoside diphosphates (at least 

non-complementary base was added (not shown). some of the dNDPs) by the potato apyrase (Liebecq, C. 

In the above illustrated experiment, 32 mU ATP Lallemand A, and Deguldre-GuiUaume, M. J. (1963) Bull, 

sulfurylase, 200 mU apyrase, 2 U (exo") Klenow, 2 pmol Soc. Chim. Biol. 45, 573-594) and the deoxynucleoside 

template/primer, and 0.4 pmol deoxynucleotides, were used. 5 diphosphate kmase contammation in the ATP sulfurylase 

Similar results were obtained (not shown) when the different preparation obtained from Sigma. It is possible to overcome 

compounds were varied within the interval: 24^8 mU ATP this problem by using a pure preparation of ATP sulfurylase, 

sulfurylase, 100^00 mU apyrase, 1-5 U (exo") Klenow, or by using more efficient dNDP degrading enzymes 

1-5 pmol template/primer, and 0.2-1.0 nmol deoxynucle- (Doremus, H. D. and Blevins, D. G. (1988) Plant Physiol, 

otides. It may be important to use an excess of polymerase lo 87(1), 41^5). Even if a pure preparation of ATP sulfurylase 

over primer/template to be sure that all free 3' ends are is used it could be an advantage to use combinations of 

extended. It may also be important that the sequencing nucleotide degrading enzymes (NTPase, NDPase, NMPase) 

primer hybridize at least one base inside from the 3' end of to increase the rate of the degradation process and to 

the template to eliminate blunt-end DNA polymerase activ- decrease the thermodynamic equilibrium concentration of 

ity (Qark, 1991, Gene, 104, 75-80). 15 dNTPs. In addition, it could be an advantage to use an 

enzyme "with low Km for dNTPs such as the Pig Pancreas 

DNA Sequencing nucleoside triphosphate diphosphohydrolase (Le Bel, D., 

Piriet, G. G. Phaneuf, S., St-Jean, P, Laliberte, J. F. and 

In the next series of experiments two different synthetic Beudoin, A. R. (1980) J. Biol. Chem. 255, 1227-1233; 

templates as well as a PGR product were sequenced in order Laliberte, J. F. St-Jean, P and Beudoin, R. (1982) J. Biol 

to investigate the feasibility of the new approach. FIGS. 3 Chem. 257, 3869-3871). 
and 4 show the result from DNA sequencing performed on 

two different synthetic templates. Both templates were EXAMPLE 2 
sequenced to the end, and in both cases the true sequence 

could be determined. When the polymerase reaches the end 25 PyroSequencing" on a PGR Product 

of the template, the signal strongly decreases indicating 

slower polymerization for the last bases. The signal was not The biotinylated PGR products were immobiHzed onto 
decreased to the same extent if a longer template was streptavidin-coated super paramagnetic beads Dynabeads™ 
sequenced (FIG. 5). The small signals observed when non- M280-Streptavidin (Dynal). Elution of single-stranded DNA 
complementary bases were added are due to PPi contami- 30 and hybridization of sequencing primer (JA 80 
nation in the nucleotide solutions. The later increase of this 5'-GArGGAAACCAAAAATGATAGG-3') SEQ ID N0:9 
background signal (false signals) is probably due to nucleo- was carried out as described earher (T. Hultman, M. Murby, 
side diphosphate kinase activity (contamination in the ATP S. Stahl, E. Homes, M. Uhlen, Nucleic Acids Res. 18: 5107 
sulfurylase preparation from Sigma). The nucleoside (1990)). The hybridized template/primer were incubated 
diphosphate kinase converts non-degraded deoxynucleoside 35 with Sequenase 2.0 DNA polymerase (Amersham). The 
diphosphates to deoxynucleoside triphosphates when a new sequencing procedure was carried out by stepwise elonga- 
deaxynucleotide triphosphate is added. The formed deoxy- tion of the primer-strand upon sequential addition of the 
nucleoside triphosphate can then be incorporated into the different deoxynucleoside triphosphates (Pharmacia 
growing primer. This effect was especially obvious when the Biotech), and simultaneous degradation of nucleotides by 
synthetic template E3PN as sequenced. When the first apyrase. The apyrase was grade VI, high ATPase/ADPase 
correct nucleotide (dCTP) is added some of the non- ratio (nucleoside 51-triphosphatase and nucleoside 
degraded dTDP is converted to dTTP After dCMP has been 5'-diphosphatase; EC 3.61.5) (Sigma Chemical Co.). The 
incorporated some of the formed dTTP can be incorporated. sequencing reaction was performed at room temperature and 
This out-of -phase obtained DNA can be further extended started by adding 0.6 nmol of one of the deoxynucleotides 
when dGTP is added. This is clearly shown when the 45 (Pharmacia Biotech). The PPi released due to nucleotide 
out-of-phasc DNA has reached the position where two A incorporation was detected as described earlier (see e.g. 
should be incorporated. The false signal is now stronger. The Example 1). The JA80 was synthesized by phosphoramidite 
following double T and C also give stronger signals whereas chemistry (Interactiva). The sequencing data obtained from 
the next single A gives a lower signal. In FIG. 5, DNA the PyroSequencing method was confirmed by semi- 
sequencing of 20 bases of a 160-base-long self -primed automated solid-phase Sanger sequencing according to 
single-stranded PGR product is shown. The obtained Hultman et al. (T. Hultman, M. Murby, S. Stahl, E. Homes, 
sequence was confirmed by semiautomatic sohd-phase M. Uhlen, Nucleic Acids Res. 18: 5107 (1990)). The reac- 
Sanger sequencing (data not shown). The main reason for tion was carried out at room temperature. The results are 
the sequencing to come out of phase is a combination of shown in FIG. 6. 



SEQUENCE LISTING 



<160> NUMBER OF SEQ ID NOS : 9 

<210> SEQ ID HO 1 

<211> LENGTH: 35 

<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION; Oligonucleotide (PEBE25) 
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-continued 



<400> SEQUENCE: 1 

gcaacgtcgc cacacacaac atacgagccg gaagg 35 

<210> SEQ ID NO 2 

<211> LENGTH: 2 3 

<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Oligonucleotide (RIT 27) 

<400> SEQUENCE: 2 

gcttccggct cgtatgttgt gtg 23 



<210> SEQ ID NO 3 

<211> LENGTH: 35 

<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Oligonucleotide (E3PN) 

<4 00> SEQUENCE: 3 

gctggaattc gtcagactgg ccgtcgtttt acaac 35 



<210> SEQ ID NO 4 

<211> LENGTH: 17 

<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Oligonucleotide (NUSPT) 

<4 00> SEQUENCE: 4 

gtaaaacgac ggccagt 17 



<210> SEQ ID NO 5 

<211> LENGTH: 51 

<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Oligonucleotide (RIT 203) 

<4 00> SEQUENCE: 5 

agcttgggtt cgaggagatc ttccgggtta cggcggaaga tctcctcgag g 51 



<210> SEQ ID NO 6 

<211> LENGTH: 51 

<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION; Oligonucleotide (RIT 204) 

<4 00> SEQUENCE: 6 

agctcctcga ggagatcttc cgccgtaacc cggaagatct cctcgaaccc a 51 



<210> SEQ ID NO 7 

<211> LENGTH: 25 

<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Oligonucleotide (ROMO 205S) 

<400> SEQUENCE: 7 

cgaggagatc ttccgggtta cggcg 25 



<210> SEQ ID NO 8 
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-continued 

<211> LENGTH: 25 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Oligonucleotide (ROMO 205B) 

<221> NAME /KEY: modifiec3j3ase 

<222> LOCATION: (1)...(1) 

<223> OTHER INFORMATION: biotin 

<400> SEQUENCE: 8 

cgaggagatc ttccgggtta cggcg 25 



<210> SEQ ID NO 9 

<211> LENGTH: 22 

<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Oligonucleotide (JA 80) 

<4 00> SEQUENCE: 9 

gatggaaacc aaaaatgata gg 22 



What is claimed is: 

1. A method of identifying a base at a target position in a 
sample DNA sequence comprising providing a sample DNA 
sequence and an extension primer, which hybridizes to the 
sample DNA immediately adjacent to the target position and 
subjecting the sample DNA and extension primer to a 30 
polymerase reaction in the presence of a deoxynucleotide or 
dideoxynucleotide whereby the deoxynucleotide or 
dideoxynucleotide will only become incorporated and 
release pyrophosphate (PPi) if it is complementary to the 
base in the target position, and detecting any release of PPi 35 
enzymically, different deoxynucleotides or dideoxynucle- 
otides being added either to separate aliquots of sample - 
primer mixture or successively to the sample -primer mixture 
and subjected to the polymerase reaction to indicate which 
deoxynucleotide or dideoxynucleotide is incorporated, 40 
wherein a nucleotide-degrading enzyme is included during 
the polymerase reaction step, such that unincorporated 
nucleotides are degraded, and whereby any release of PPi is 
indicative of incorporation of deoxynucleotide or dideoxy- 
nucleotide and the identification of a base complementary 45 
thereto. 

2. A method as claimed in claim 1, wherein the 
nucleotide-degrading enzyme is apyrase. 

3. A method as claimed in claim 1, wherein a mixture of 
nucleotide-degrading enzymes is used having nucleoside 50 
triphosphatase, nucleoside diphosphatasc and nucleoside 
monophosphatase activity. 

4. A method as claimed in claim 1, wherein the 
nucleotide-degrading enzyme is immobilised on a solid 
support. 55 

5. A method as claimed in claim 4, wherein said immo- 
bilised nucleotide-degrading enzyme is added after nucle- 
otide incorporation by the polymerase has taken place, and 
then removed prior to a subsequent nucleotide incorporation 
reaction step. 60 

6. A method as claimed in claim 1, wherein PPi release is 
detected using the Enzymatic Luminometric Inorganic Pyro- 
phosphate Detection Assay (ELIDA). 

7. A method as claimed in claim 1, wherein the PPi 
detection enzymes are included in the polymerase reaction 65 
step and the polymerase reaction and PPi release detection 
steps are performed substantially simultaneously. 



8. A method as claimed in claim 1, wherein in the 
polymerase reaction a dATP or ddATP analogue is used 
which is capable of acting as a substrate for a polymerase but 
incapable of acting as a substrate for a PPi detection enzyme. 

9. A method as claimed in claim 8, wherein the dATP 
analogue is deoxyadenosine a-thiotriphosphate (dAl PaS). 

10. A method as claimed in claim 1, further comprising 
the use of the a-thio analogues of dCTP, dGTP and dTTP. 

11. A method as claimed in claim 1, wiierein the sample 
DNA is immobilised or provided with means for attachment 
to a solid support. 

12. A method as claimed in claim 1, wherein the sample 
DNA is flrsl amplified. 

13. A method as claimed in claim 1, wherein the extension 
primer contains a loop and anneals back on itself and the 3' 
end of the sample DNA. 

14. A method as claimed in claim 1, wherein an exonu- 
clease deficient (exo") high fidelity polymerase is used. 

15. A method as claimed in claim 1, for identification of 
a base in a single target position in a DNA sequence 
comprising subjecting the sample DNA to amplification; 
immobilizing the amplified DNA and then subjecting the 
immobilized DNA to strand separation, removing the non- 
immobilized strand and providing an extension primer, 
which hybridizes to the immobilized DNA immediately 
adjacent to the target position; subjecting each of four 
aliquots of the immobilized single stranded DNA to a 
polymerase reaction in the presence of a dideoxynucleotide, 
each aliquot using a different dideoxynucleotide whereby 
only the dideoxynucleotide complementary to the base in the 
target position becomes incorporated; subjecting the four 
aliquots to extension in the presence of all four 
deoxynucleotides, whereby in each aliquot the DNA which 
has not reacted with the dideoxynucleotide is extended to 
form double stranded DNA while the dideoxy-b locked DNA 
remains as single stranded DNA; followed by identifying the 
double stranded and/or single stranded DNA to indicate 
which dideoxynucleotide was incorporated and hence which 
base was present in the target position. 

16. A kit for use in a method as defined in claim 1, 
comprising: 

(a) a polymerase; 

(b) detection enz5niie means for identifying p5n:ophosphate 
release; 
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(c) a mixture of iiucleotide-degrading enzymes; 

(d) deoxynucleotides, or optionally deoxynucleotide 
analogues, optionally including, in place of dATP, a dATP 
analogue which is capable of acting as a substrate for a 
polymerase but incapable of acting as a substrate for a 
said PPi-detection enzyme; and 

(c) optionally a test specific primer which hybridises to 
sample DNAso that the target position is directly adjacent 
to the y end of the primer; and 
(f) optionally dideoxynucleotides, or optionally dideoxy- 
nucleotide analogues, optionally ddATP being replaced 
by a ddATP analogue which is capable of acting as a 
substrate for a polymerase but incapable of acting as a 
substrate for a said PPi-detection enzyme. 
17. A method of identifying a base at a target position in 
a sample DNA sequence comprising arranging a multiplicity 
of DNA sequences in array format on a solid surface, 
providing to each sample an extension primer, which hybrid- 
izes to the sample DNA immediately adjacent to the target 
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position and subjecting the sample DNA and extension 
primer to a polymerase reaction in the presence of a deoxy- 
nucleotide or dideoxynucleotide whereby the deoxynucle- 
otide or dideoxynucleotide will only become incorporated 
and release pyrophosphate (PPi) if it is complementary to 
the base in the target position and detecting any release of 
PPi enzymically, different deoxynucleotides or dideoxy- 
nucleotides being added either to separate aliquots of 
sample-primer mixture or successively to the same sample- 
primer mixture and subjected to the polymerase reaction to 
indicate which deoxynucleotide or dideoxynucleotide is 
incorporated whereby a nucleotide-degrading enzyme is 
included during the polymerase reaction step such that 
unincorporated nucleotides are degraded, and whereby any 
release of PPi is indicative of incorporation of deoxynucle- 
otide or dideoxynucleotide and the identification of a base 
complementary thereto. 
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[57] ABSTRACT 

The present invention provides a method of PCR, methyla- 
tion specific PCR (MSP), for rapid identification of DNA 
methylation patterns in a CpG-containing nucleic acid. MSP 
uses agents to modify unmethylated cytosine in a nucleic 
acid of interest and then uses the PCR reaction to amplify 
the CpG-containing nucleic acid in the specimen by means 
of CpG-specific oligonucleotide primers. The oligonecle- 
otide primers distinguish between modified methylated and 
nonmethylated nucleic add. Kits utilizing MSP for the 
detection of methylated CpG-containing nucleic acids are 
also provided. 

27 Claims, 3 Drawing Sheets 
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METHOD OF DETECTION OF 
METHYLATED NUCLEIC ACID USING 
AGENTS WHICH MODff^ 
UNMETHYLATED CYTOSINE AND 

DISTINGUISHING MODIFIED 5 
METHYLATED AND NON-METHYLATED 
NUCLEIC ACmS 

This invention was made with government support under 
Grant Nos. CA43318 and CA54396 awarded by the lO 
National Institutes of Health. The government has certain 
lights in this invention. 



FIELD OF THE INVENTION 

The present invention relates generally to regulation of 
gene expression, and more specifically to a method of 
determining the DNA methylation status of CpG sites in a 
given locus. 

BACKGROUND OF THE INVENTION 
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In higher order eukaryotes DNA is methylated only at 
cytosines located 5' to guanosine in the CpG dinucleotide. 
This modification has important regulatory eflfects on gene 
expression, especially when involving CpG rich areas, 25 
known as CpG islands, located in the promoter regions of 
many genes. While almost all gene-associated islands are 
protected from methylation on autosomal chromosomes, 
extensive methylation of CpG islands has been associated 
with transcriptional inactivation of selected imprinted genes ^ 
and genes on the inactive X -chromosome of females. 
Abberant methylation of normally unmethylated CpG 
islands has been described as a frequent event in immortal- 
ized and transformed cells, and has been associated with 
transcrqMional inactivation of defined tumor supfH'cssor 35 
genes in human cancers. 

Human cancer cells typically contain somatically altered 
genomes, characterized by mutation, amplification, or dele- 
tion of critical genes. In addition, the DNA template from 
human cancer cells often displays somatic changes in DNA 40 
m^ylation (E. R. Fearon, et al., CelL 61:759, 1990; R A. 
Jones, et al.. Cancer Res.. 46:461, 1986; R. Holliday, 
Science, 238:163, 1987; A. De Bustros, et aL, Pmc. NatL 
Acad. USA, 85:5693, 1988); P. A. Jones, et al., Adv, 
Cancer Res,^ 54:1, 1990; S. B. Bayiin, et aL, Cancer CeUs^ 45 
3 383, 1991; M. Makos, et al, Proc. NatL Acad, ScL, USA, 
89:1929, 1992; N. Ohtani-Ftijita, et aL, Oncogene, 8:1063, 
1993). However, the fnrecise role of abnonnal DNA methy- 
lation in human tumcnigenesis has not been established. 
DNA methylases transfer methyl groups from the universal 50 
methyl donor S-adenosyl methionine to specific sites on the 
DNA. Several biological functions have been attributed to 
the methylated bases in DNA. The most established bio- 
logical function is the protection of the DNA from digestion 
by cognate restriction enzymes. The restriction modification 55 
phenomenon has, so far, been observed only in bacteria. 
Mammalian cells, however, possess a different methylase 
that exclusively methylates cytosine residues on the DNA, 
that are 5* neighbors of guanine (CpG). This methylation has 
been shown by several lines of evidence to play a role in <5o 
gene activity, cell differentiation, tumorigenesis, 
X-chromosome inactivation, genomic impinting and o&er 
major biological processes (Razin, A., H., and Riggs, R. D. 
eds. in DNA Methylation Biochemistry and Biological 
Significance, Spring er-Vcrlag, N.Y,, 1984). ^5 

A CpG rich region, or "CpG island", has recently been 
identified at 17pl3.3, which is aberrantly hypermethylated 



in multiple common types of human cancers (Makos, M., et 
aL, Proc, NatL Acad, ScL USA. 89:1929, 1992; Makos, M., 
et al. Cancer Res,, 53:2715, 1993; Makos, M., et at Cancer 
Res. 53:2719, 1993). This hypermethylation coincides with 
timing and frequency of 17p losses and p53 mutations in 
brain, colon, and renal cancers. Silenced gene transcription 
associated with hypermethylation of the normally unmethy> 
lated promoter region CpG islands has been implicated as an 
alternative mechanism to mutations of coding regions for 
inactivation of tumor suppressor genes (Baylin, S. B., et al. 
Cancer Cells. 3:383, 1991; Jones, P. A. and Buckley, J. D., 
Adv. Cancer Res.. 54:1-23, 1990). This change has now 
been associated with the loss of expression of VHL, a renal 
cancer tumor suppressor gene on 3p (J. G. Herman, et al, 
Proc. NatL Acad. ScL USA. 91:9700-9704, 1994), the estro- 
gen receptor gene on 6q (Ottaviano, Y. L., et al„ Cancer 
Res., 54:2552, 1994) and the H19 gene on Hp (Steenman, 
M, J. C et al. Nature Genetics. 7:433, 1994). 

In eukaryotic cells, methylation of cytosine residues that 
are immediately 5* to a guanosine, occurs predominantly in 
CG poor regions (Bird, A„ Nature, 321:209, 1986). In 
contrast, discrete regions of CG dinucleotides called CpG 
islands remain unmethylated in normal cells, except during 
X-chromosome inactivation (Migeon, et al„ supra) and 
parental specific imprinting (Li, et al. Nature. 366:362, 
1993) where methylation of 5* regulatory regions can lead to 
transcriptional repression. De novo methylation of the Rb 
gene has been demonstrated in a small fraction of retino- 
blastomas (Sakai, et al. Am, J, Hum. Genet,. 48:880, 1991), 
and recently, a more detailed analysis of the VHL gene 
showed aborant methylation in a subset of sporadic renal 
cell carcinomas (Hmnan, et aL, Proc. Nad. Acad. Sci., 
U.S.A.. 91:9700, 1994). Expression of a tumor suppressor 
gene can also be abolished by de novo DNA methylation of 
a normally unmethylated 5' CpG island (Issa, et al. Nature 
Genet.. 7:536, 1994; Hwnan. et al., supra; Merio, et al.. 
Nature Med.. 1:686, 1995; Herman, et al.. Cancer Res., 
56:722, 1996; Graff, et al. Cancer Res.. 55:5195, 1995; 
Herman, et al. Cancer Res,. 55:4525. 1995). 

Most of the methods developed to date for detection of 
methylated cytosine depend upon cleavage of the phos- 
phodiester bond alongside cytosine residues, using either 
methylation-sensitive restriction enzymes or reactive chemi- 
cals such as hydrazine whi<^ differentiate between cytosine 
and its 5-methyl derivative. The use of methylation-sensitive 
enzymes suffers from the disadvantage that it is not of 
general ^jplicabUity, since only a limited proportion of 
potentially methylated sites in the genome can be analyzed. 
Genomic sequencing protocols which identify a 5-MeC 
residue in genomic DNA as a site that is not cleaved by any 
of the Maxam Gilbert sequencing reactions, are a substantial 
improvement on the original genomic sequencing method, 
but still suffer disadvantages such as the requirement for 
large amount of genomic DNA and the difficulty in detecting 
a gap in a sequencing ladder which may contain bands of 
varying intensity. 

Mapping of methylated regions in DNA has relied pri- 
marily on Southern h3rbridization approaches, based on the 
inability of metiiylation-sensitive restriction enzymes to 
cleave sequences which ccHitain one or more methylated 
CpG sites. This method piovides an assessment of the 
overall methylation status of CpG islands, including some 
quantitative analysis, but is relatively insensitive, requires 
large amounts of high molecular weight DNA and can only 
provide information about those CpG sites found within 
sequences recognized by methylation-sensitive restriction 
enzymes. A more sensitive method of detecting methylation 
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patterns combines the use of metfaylation-sensitive enzymes 
and the polymerase chain reaction (PGR). After digestion 
DNA with the enzyme, PGR will amplify from primers 
flanking the restriction site only if DNA cleavage was 
prevented by mcthylation. Like Southern-based approaches, 5 
this method can only monitor CpG methylation in 
methylation-sensitive restriction sites. Moreover, the restric- 
tion of unmethylated DNA must be complete, since any 
uncleaved DNA will be an^lified by PGR yielding a false 
positive result for methylation. This approach has been lo 
useful in studying samples where a high percentage of 
alleles of interest are methylated, such as the study of 
imprinted genes and X-chromosome inactivated genes, 
Howevor^ difficulties in distinguishing between incomplete 
restriction and low nuinbers of methylated alleles make this is 
approach unreliable for detection of tumor suppressor gene 
hypeame^ylation in small samples wh^e methylated alleles 
represent a small fraction of the population. 

Another method that avoids the use of restriction endo- 
nucleases utilizes bisulfite treatment of DNA to convert all 20 
unmethylated cytosines to uracil. The altered DNA is an^li- 
fied and sequenced to show the methylation status of all Q>G 
sites. HowevCT, this method is technically difQcult. labor 
intensive and without cloning amplified products, it is less 
sensitive than Southern analysis, requiring ap{HX>ximately 
10% of the alleles to be methylated fcH- detection. 

Identification of the earliest genetic changes in tumori- 
genesis is a major focus in molecular cancer research. 
Diagnostic approaches based on identification of these 
changes are likely to allow implementation of early detec- 
tion strategies and novel therapeutic approaches targeting 
these early changes might lead to more effective cancer 
treatment 
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The precise mapping of DNA methylation patterns in 
CpG islands has become essential for understanding diverse 
biological processes such as the regulation of inqsrinted 
genes, X-chromosome inactivation, and tumor suppressor ^ 
gene silencing in human cancer. The present invention 
provides a method for rapid assessment of the methylation 
status of any group of CpG sites within a CpG island, 
independent of the use of methylation- sensitive restriction 
enzymes. 

The method of the invention includes modification of 
DNA by sodium bisulfite or a comparable agent which 
converts all unmethylated but not methylated cytosines to 
uracil, and subsequent amplification with primers specific 
for metiiylated versus unm^ylated DNA, This method of 
'•methylation specific PGR** or MSP, requires only small 
amounts of DNA, is sensitive to 0.1% of methylated alleles 
of a given CpG island locus, and can be performed 00 DNA 
extracted from paraffin-embedded samples, for example. 
MSP eliminates the false positive results inherent to previ- 55 
ous PCR-based approadies which relied on differential 
restriction enzyme cleavage to distinguish methylated from 
unmethylated DNA. 

In a particular aspect of the invention, MSP is useful for 
identifying promoter region hypermethylation changes asso- 60 
dated with transcriptional inactivation in tumor suppresses: 
genes, for example, pl6, pl5, E-cadherin and VHL, in 
human neoplasia. 

BRIEF DESCRIPnON OF THE DRAWINGS ^3 

FIG. 1 shows genomic sequencing of pl6. The sequence 
shown has the most 5' region at the bottom of tiie gel. 



beginning at +175 in relation to a major transcriptional start 
site (Hara, et al., MoL Cell Biol. 16:859. 1996). AU cytosines 
in the unmethylated cell line H249 have been converted to 
thymidine, while aU G's in GpG dinudeotides in the methy- 
lated ceU H157 remains as C, indicating methylation. ] 
enclosed a BstUI site which is at -59 in relation to the 
transnational start site in Genbank sequence U 128 18 
(Hussussian, et al. Nat, Genet.. 8:15, 1994), but which is 
incorrectiy identified as CGCA in sequence X94154 (Hara, 
et al., supra). This CGCG site represents tiie 3* location of 
the sense primer used for pl6 MSP. 

FIGS. 2A-2E show polyacrylamide gels with the Methy- 
lation Spediic PGR products of pl6. Primer sets used for 
ampHiication are designated as unmethylated (U), methy- 
lated (M), or unmodified/wild-type (W). * designates the 
molecular weight marker pBR322-MspI digest. Panel A 
shows amplification of bisulfite-treated DNA from cancer 
cell lines and normal lymphocytes, and untreated DNA 
(from cell line H249). Panel B shows mixing of various 
amount of H157 DNA witii 1 jig of H249 DNA prior to 
bisulfite treatment to assess the detection sensitivity of MSP 
for methylated alleles. Modified DNA from a primary lung 
cancer sample and normal lung are also shown. Panel C 
shows an^Ufication with the pl6-U2 (U) primers, and 
pl6-M2 (M) described in Table 1. Panel D shows the 
an^lified pl6 products of panel C restricted with BstUI(+) 
or not restricted (— ). Panel E shows results of testing for 
regional methylation of GpG islands with MSP, using sense 
primers pl6-U2 (U) and pl6-M2 (M), which are na^yla- 
tion specific, and an antisense primer which is not methy- 
lation specific. 

FIGS. 3 A-3E show polyacrylaride gels of MSP products 
from analysis of several genes. Primer sets used for an^li- 
fication are not designated as unmethylated (U), methylatecd 
(M), or unmodifiedAvild-type (W). * designates the molecu- 
lar weight marker pBR322-MspI digest and ** designates 
the 123 bp molecular weight marker. AU DNA samples were 
bisulfite treated except those designated untreated. Panel A 
shows the results from MSP for p 15. Panel B shows the pl5 
products restricted with BstUI (+) or not restricted (-). Panel 
G shows the products of MSP for VHL. Panel D shows the 
VHL products restricted with BstUI(+) or not restricted (-). 
Panel E shows the products of MSP for E-cadherin. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

The present invention provides methylation specific PGR 
(MSP) for identification of DNA methylation patterns. MSP 
uses the PGR reaction itself to distinguish between methy- 
lated and unmetiiylated DNA, which adds an improved 
sensitivity <^ m^etiiylation detection. 

Unlike previous genomic sequencing methods for methy- 
lation identification which utilizes amplification primers 
which are specifically designed to avoid the GpG sequences, 
MSP primers are specifically designed to recognize GpG 
sites to take advantage of the differences in methylation to 
amplify specific products to be identified by the invention 
assay. 

As illustrated in the Exan^les below, MSP provides 
significant advantages over previous PGR and other methods 
used for assaying m^ylation. MSP is markedly more 
sensitive than Southern analyses, facilitating detection of 
low numbers of methylated alleles and the study of DNA 
£com small samples. MSP allows tiie study of paraffin- 
embedded materials, which could not jHeviously be ana- 
lyzed by Southern analysis. MSP also allows examination of 
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all CpG sites, not just those within sequences recognized by The primers used in the invention for amplification of the 

methylation-sensitive restriction enzymes. This markedly CpG-containing nucleic acid in the specimen, after bisulfite 

incrcasestheniraiberof such sites which can be assessed and modification, specifically distinguish between untreated 

will allow r^id. fine mapping of mefhylation patterns DNA, methylated, and non-methylated DNA. MSP primers 

throu0iout CpG rich regions. MSP also eliminates the 5 for the non-methylated DNA preferably have a T in the 3' 

frequent false positive results due to partial digestion of CO pair to distinguish it from the C retained in methylated 

methylation-sensitive enzymes inherent in previous PGR DNA, and die con^liment is designed for the antiscnse 

methods for detecting methylation. Furthermore, with MSR primer. MSP isimers usually contain relatively few Cs or Gs 

simultaneous detection of unmethylated and methylated in the sequence since the Cs will be absent in Ihe sense 

products in a single sample confirms the integrity of DNA as primer and the Gs absent in the antisense primer (C becomes 

a template for PGR and allows a semi-quantitative assess- modified to U (uracil) which is amplified as T (thymidine) 

ment of allele types which correlates wifii results of South- in the amplification product), 

ern analysis. Finally, the ability to validate the amplified The primers of the invention embrace oligonucleotides of 

product by differential restriction patterns is an additional sufficient length and appropriate sequence so as to provide 

advantage. specific initiation of polymerization on a significant number 

The only technique that can provide more direct analysis nucleic adds in the polymorphic locus. Specifically, the 

than MSP for most CpG sites within a defined region is "primer" as used herein refers to a sequence compris- 

genomic sequencing. However, MSP can provide similar . deoxyribonucleotides or ribonucleotides, 

information and has the following advantages. First, MSP is p^fcrably more than diree, and most preferably more than 

Tt^VS'^^S^^d"^^^^^^^^^^^ KTIThoi^ri; 20 8,which sequence is capable of ioitiadBg synthesis of a 

L • • auaiy:»x» t4uuu*g primcT exteoslon product, which is substantially compie- 

contrast, genomic sequencing* amplification* cloning, and k*""*^ ^ _^ . ^ * ^ . 

«.^uua:>u ^yLi^xLi^s. ^ il*ot> i 'A *u meutarv to a Dolvmorphic locus straud. Environmental cou- 

subsequent sequencmg may take days. MSP also avoids the M^^umy a pwymwapux^ * • » 1 ^ ^ 

use of expensive sequencing reagents and the use of radic^ d^tions conducive to synlhesis mcludc the presence of 

activity. Both of these factors make MSP better suited for the nucleoside triphosphates and an agent for polymenzation, 

analysis of large numbers of sanq>les. Third, the use of PGR 25 ^"^^ polymerase, and a suitable temperattu^ and 

as the step to distinguish methylated from unmethylated pH. The primer is preferably single stranded for maximum 

DNA in MSP allows for significant increase in the sensitivity efficiency in amplification, but may be double stranded. If 

of methylation detection. For example, if cloning is not used double stranded, the primer is first treated to separate its 

prior to genomic sequencing of the DNA, less than 10% strands before being used to prepare extension products, 

methylated DNA in a background of unmethylated DNA ^ Preferably, the primer is an oligodeoxy ribonucleo-tide. The 

cannot be seen (Myohanen, et al, supra). The use of PGR and primer must be sufficiently long to prime the synthesis of 

cloning does allow sensitive detection of methylation pat- extension products in the presence of the inducing agent for 

terns in very small amounts of DNA by genomic sequencing polymerization. The exact length of primer will depend on 

(Frommer, et al, Proc. Natl Acad. Sci. USA. 89:1827, 1992; many factors, including tenqwature, buffer, and nucleotide 

Clark, et al. Nucleic Acids Research, 22:2990, 1994). con^osition. The oligonucleotide primer typically contains 

However^ this means in practice that it would require ^2-20 or more nucleotides, although it may contain fewer 

sequencing analysis of 10 clones to detect 10% methylation, nucleotides 

100 clones to detect 1% methylation, and to reach the level • ^ a - ^ . ***„Kc^««+;oiKr»» 

of sensitivity we have demons^ated with MSP(1:1000), one f "^T u^'^t^.^f ^^^ 

would have to sequence 1000 individual clones. complenaent^ to each strand of the genomic locus to be 

Inafirstembodtoent,theinventionproW^^ 40 ^^^^^^ ^^^^^^^^^^^ 

detecting a methylated CpG-containing nucleic add, the ^^'^^Tu^a cJT 

method including contacting a nucleic acid-containing sufficientiy complementary to hybridize with theu^ respec- 

specimen witii an agent that modifies unmethylated tive strands under conditions whidi allow the agent for 

cytosine; amplifying the CpG-containing nucleic acid in the polymerization to perform In other words, the pnmers 

^>ecimen by means of CpG-spedfic oHgonucleodde prim- 45 should have sufficient conq>lementarity with the 5' and 3' 

ers; and detecting the methylated nucleic acid. It is under- flanking sequences to hybridize therewith and permit ampU- 

stood that while the an^lification step is optional, it is fication of the genomic locus. 

desirable in the preferred method of the invention. Oligonucleotide primers of the invention are employed in 

The term "modifies" as used herein means the conversion the an^Hfication process which is an enzymatic chain 

of an unmethylated cytosine to another nucleotide which 50 reaction that produces e^^nential quantities of target locus 

wiU distinguish the unmethylated from the methylated relative to the numbo- of reaction steps invc^ved. TVpicaUy, 

cytosine. Preferably, the agent modifies unmethylated one primer is con^lementary to the negative (-) strand of 

cytosine to uracil. Preferably, the agent used for modifying the locus and the other is complementary to the positive (+) 

unmethylated cytosine is sodium bisulfite, however, other strand. Annealing the primers to denatured nucleic acid 

agents that similarly modify unmethylated cytosine, but not 55 followed by extension witii an enzyme, such as tiie large 

methylated cytosine can also be used in the method of the fragment of DNA Polymerase I (Klenow) and nucleotides, 

invention. Sodium bisulfite (NaHSOa) reacts readily with results in newly synthesized + and - strands containing the 

the 5,6-double bond of cytosine, but poOTlywitiimetiiylated target locus sequence. Because tiiese newly synthesized 

cytosine. Cytosine reacts with the bisulfite ion to form a sequences are also ten^lates, repeated cycles of denaturing, 

sulfonated cytosine reaction intermediate which is suscep- 60 primer annealing, and extension results in exponential pro- 

tible to deamination, giving rise to a sulfonated uracil. The duction of the region (Le., the target locus sequence) defined 

sulfonate group can be removed under alkaUne conditions, by the primer. The product of the chain reaction is a discrete 

resulting in the formation of uracil. Uracil is recognized as nucleic acid duplex with termini corre^onding to the ends 

a thymine by Taq polymerase and therefore upon PGR, the of the specific primers employed. 

resultant product contains cytosine only at the position 65 The oligonucleotide primers of the invention may be 

where 5-methylcytosine occurs in the starting tenq>late prepared using any suitable method, such as conventional 

DNA, phosphotriester and phosphodiester methods or automated 
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embodiments thereof. In one such automated embodiment 
diethylphosphoramidites are used as starting materials and 
may be synthesized as described by Beaucage. et al. 
(Tetrahedron Letters. 22:1859-1862, 1981). One method for 
synthesizing oHgonucleotides on a modified solid siq^port is 5 
described in U.S. Pat. No. 4.458,066. 

Any nucleic acid specimen, in purified or nonpurified 
form, can be utilized as the starting nucleic add or acids, 
provided it contains, or is suspected of containing, the 
specific nucleic acid sequence containing the target locus 
(e.g., CpG). Thus, the process may employ, for example, 
DNA or RNA, including messenger RNA, wherein DNA or 
RNAmay be single stranded or double stranded. In the event 
that RNA is to be used as a ten^late, enzymes^ ai^or 
conditions optimal for reverse transcribing the template to 15 
DNA would be utilized. In addition, a DNA-RNA hybrid 
which contains one strand of each may be utilized. A mixture 
of nucleic acids may also be employed, or the nucleic acids 
produced in a previous amplification reaction herein, using 
the same or different primers may be so utilized. The specific 20 
nucleic acid sequence to be amplified, i.e., the target locus, 
may be a fraction of a larger molecule or can be present 
initially as a discrete molecule, so that the specific sequence 
constitutes the entire nucleic acid. It is not necessary that the 
sequence to be amplified be present initially in a pure form; 
it may be a minor fraction of a complex mixture, such as 
contained in whole human DNA. 

The nucleic add-containing specimen used for detection 
of methylated CpG may be from any source including brain, 
colon, lurogenital, hematopoietic, thymus, testis, ovarian, 
uterine, prostate, breast, colon, lung and renal tissue and 
may be extracted by a variety of techniques such as that 
described by Maniatis, et al {Molecular Cloning: A Labo- 
ratory Manual. Cold Spring Harbor, N.Y„ pp 280, 281, 
1982). 

tf the extracted sample is impure (such as plasma, serum, 
or blood or a sample embedded in pacrafin), it may be treated 
before amplification with an amount a reagent effective to 
open the cells, fluids, tissues, or animal cell membranes of ^ 
the sant^le, and to expose and/or separate the strand(s) of the 
nucleic add(s). This lysing and nudeic add denaturing step 
to expose and separate the strands wiU allow amplification 
to occur much more readily. 

Wherc the target nucleic acid sequence of the sample 45 
contains two strands, it is necessary to separate the strands 
of the nucleic add b»efore it can be used as the ten^ate. 
Strand separation can be effected either as a separate step or 
simultaneously with the synthesis of the primer extension 
products. This strand separation can be acconq>lished using 50 
various suitable denaturing conditions, including j^ysical, 
chemical, or enzymatic means, the word **denaturing** 
indudes all such means. One physical method of separating 
nucleic acid strands involves heating die nucleic acid until it 
is denatured. Topical heat denaturation may involve tem- 55 
peratures ranging from about 80^ to 105° C, for tunes 
ranging from about 1 to 10 minutes. Strand separation may 
also be induced by an enzyme from the class of enzymes 
known as helicases or by the enzyme RecA, which has 
helicase activity, and in the presence of riboATP, is known 50 
to denature DNA. The reaction conditions suitable for strand 
separation of nucleic adds with helicases are described by 
Kuhn Hofltoann-Berling (CSH-Quantitative Biology, 43:63, 
1978) and techniques for usii^ RecA are reviewed in C. 
Radding {Ann, Rev, Genetics. 16:405-437, 1982). 55 

When complementary strands of nudeic add or adds are 
separated, regardless of whether the nucleic add was odgi- 
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nally double or single stranded, the s^arated strands are 
ready to be used as a template for the synthesis of additional 
nuddc add strands. This synthesis is performed under 
conditions allowing hybridization of primers to templates to 
occur. Generally synthesis occurs in a buffered aqueous 
solution, prefcTfijly at a pH of 7-9, most prefCTably about 8. 
Preferably, a molar excess (for genomic nucleic add, usuaUy 
about 10^:1 primer:template) of the two oligonucleotide 
primers is added to the buffer containing the separated 
template strands. Jt is understood, however, that the amount 
of complementary strand may not be known if the process of 
the invention is used for diagnostic ^plications, so that the 
amount of primer relative to the amount of con^lementary 
strand cannot be determined with certainty. As a practical 
matter, however, the amount of primer added will generally 
be in molar excess over tiie amount of complementary strand 
(template) when the sequence to be amplified is contained in 
a mixture of con^licated long-diain nucldc add strands. A 
large molar excess is preferred to in^srove the effidency of 
the process. 

The deoxyribonucleoside triphosphates dAFP, dCTP, 
dGTP, and dTTP are added to the synthesis mixture, either 
separately or together with the primers, in adequate amounts 
and the resulting solution is heated to about 90^-100° C, 
from about 1 to 10 minutes, preferably from 1 to 4 minutes. 
After this heating period, the solution is allowed to cool to 
room temperature, whidi is preferable for the primer hybrid- 
ization. To the cooled mixture is added an appropriate agent 
for effecting the primer extension reaction (called herein 
"agent for polymerization*'), and the reaction is aUowed to 
occur under conditions known in the art. The agent for 
polymerization may also be added together with the other 
reagents if it is heat stable. This synthesis (or amplification) 
reaction may occur at room temperature up to a temperature 
above which the agent for polymerization no longer func> 
tions. Thus, for example, if DNA polymerase is used as the 
agent, the ten^erature is generally no greater than about 40° 
C. Most conveniently the reaction occurs at room ten^^era- 
ture. 

The agent for polymerization may be any conqx>und or 
system which will function to acconq^lish tfie synthesis of 
primer extension products, including enzymes. Suitable 
enzymes for this purpose include, for exan^le, E. coli DNA 
polymerase L Klenow fragment of E. colt DNA polymerase 
I, T4 DNA polymerase, other available DNA polymerases, 
polymerase mutcins, reverse transcriptase, and other 
enzymes, including heat-stable enzymes (i.e., those enzymes 
which perfcffm primer extension after being subjected to 
temperatures suffidentiy elevated to cause denaturation). 
Suitable enzymes will facilitate combination of the nude- 
otides in the {Mroper manner to form the primer extension 
products whidi are complementary to each locus nucleic 
add strand. Generally^ the synthesis will be initiated at the 
3' end of each primer and proceed in the 5* direction along 
the template strand, until synthesis terminates, producing 
molecules of different lengths. There may Ix; agents for 
polymerization, however, which initiate synthesis at the 5' 
end and proceed in the otiier direction, using the same 
process as described above. 

Preferably, the method of amplifying is by PCR, as 
described herein and as is commonly used by those of 
ordinary skUl in the art. Alternative methods of amplification 
have been described and can also be employed as long as the 
methylated and non-methylated lod an^>lified by PGR using 
the primers of the invention is simUarly amplified by the 
alternative means. 

The amplified products are preferably identified as methy- 
lated or non-methylated by sequencing. Sequences amplified 
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by the methods of the inventioii can be further evaluated^ 
detected, cloned, sequenced, and the like, either in solution 
or after binding to a solid support, by any method usually 
applied to the detection of a specific DNA sequence such as 
PCR, oligomer restriction (Saiki, et al, BiolTechnology^ 
3:1008-1012, 1985), allele- specific oligonucleotide (ASO) 
probe analysis (Conner, et al., Proc, NatL Acad, ScL USA. 
80:278, 1983), oligonucleotide ligation assays (OLAs) 
(Landegren, et al.. Science. 241:1077, 1988), and the like. 
Molecular techniques for DNA analysis have been reviewed 
(Landegren, et aL, Science. 242:229-237, 1988), 

Optionally, the methylation pattern of the nucleic add can 
be confirms! by restriction enzyme digestion and Southern 
blot analysis. Examples of methylation sensitive restriction 
endonucleases which can be used to detect 5*CpG methyla- 
tion include Smal, SacII, EagI, Mspl, Hpall, BstUI and 
BssHU, for example. 

Exemplaiy target polynucleotide sequences to which the 
primer hybridizes have a sequence as listed below. 
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-continued 




SEQIDNO: 


5'-CCCAAACCAAACACCACAAA-3'; 


45 


5 -TIACK3TIAGAGGGTEA[rCGCC3T-3" and 


46 


5*-TAACTAAAAArrcACCTACCGAC-3'; and 


47 


S'-TAATlTTAGGTIAGAGGGTTArTGT-S' and 


48 


5'-CACAACCAArCAACAACACA-3*. 


49 



*Also included are modifications of the above sequences, incltxiing SEQ ID 
10 NO:26 having tbe sequence TCAC at die 5" ead; SEQ ID NO:27 having the 
sequence CC added at die S' end; SEQ ID NO:2S havipg die sequence 
5'-TIAnAGAGGGTGGGGCGGArCGC-3'; SEQ ID NO: 29 having die 
sequeoce 5'-GACCCCGAACCGCGACCGTAA-3'; SEQ ID NO:30 having 
dK sequence TGG added at die 5' end; and SEQ ID NO:31 having die 
sequence TACC added at the 5' end. All of these modified primers anneal at 

15 

Typically, the CpG-containing nucleic acid is in the region 
of the promoter of a structural gene. For exair^Ie, the 
promoter region of tumor suppressor genes have been iden- 
tJiied as containing methylated CpG island. The promoter 



SEQ ID NO. 


Wild type pl6 


5 • -GCGGTCCGCCCC ACCCTCTG-3'; 


1 


5'-CCACGGCCGCGGCCCG-3'; 


2 


MedLylatedpl6-l* 


5M3CGATCCGCCCC ACCCrCTAArAA-3'; 


3 


5'-TrACC3GTCGCGGTTCGGGGTC-3'; 


4 


Umnetfaylated pl6-l 


y-ACAArcCACCCCACCCTCTAArAA-3*; 


5 




y-TrArGGTTGTOGTrTGGGGTIG-S'; 


6 


Methylated pi 6-2 


5*<3CGArCCGCCCCACCCrCTAArAA-3' 


7 


5'-CGGTCGGAGGTCGAnTAGGTGG-3' 


8 


Umnethylated pl6-2 


5'-ACAArCCACCCCACCCTCTAArAA-3'; 


9 




5'-TGGlTGGAGGTrGAITIAGGTGG-3*; 


10 


Wild type pl5 


5'-TCrrGGCCGCAGGGTGCG-3'; 


11 


5*-CCGGCCGCTCX3GCCACT-3'; 


12 


Mediylated plS 


y-AACCGCAAAATACGAACGC-y; 


13 




5'-TCGGTCGTrCGGTrArrGTACG-3' : 


14 


UnmedQrlated pl5 


5'-AACCACAAAArACAAACACATCACA-3"; 


15 


5*-TTGGTTOTTrGGTIArTGTArGG-3'; 


16 


Mediylated VHL 


5'-GCGTACGCAAAAAAArCCTCCA-3'; 


17 


5-TrCGCGGCGTrcGGTrC-3'; 


18 


Unmcdiylated VHL 


y-ACAIACACAAAAAAATCCrcCAAC-yi 


19 


y-TTTOTOGTGTTTOGTITGGG-S'; 


20 


Methylated E'cadberin 


5'-ACGCGAIAACCCTCTAACCTAA-3'; 


21 


5'-GTCGGIAGGTGAAnTnAGTIA-3'; 


22 


Unmcdiylated E-cadherin 


5 -ACAADWVCCCTCTAACCTAAAAriA-3'i and 


23 


5' -TGTOTIt3TTGATTOGTTOTG-3*. 


24 



Exemplary primer pairs included in the invention that 
hytxidtze to die above sequences include: 

SEQ ID NO: 



5"-CAGAGGGrGGGGCGGACX:OC-3' and 
5'-CGGGCCGCGGCCGTGG-3"; 
S'-TXArrAGAGGGTaGGGCGGArCGC-y and 
5'-GACCOCX5AACCGCGAOCGTAA-3'; 
y-TlAriAGAGGGTGGGGTGGAITGT-3' and 
5'-CAACCCCAAACCACAACCATAA-3'; 
5 -TrAnAGAGGGTGGGGCGGArCGC-3* and 
5-CC ACCTAAATCGACCTCCGACCG-3' ; 
5'-TIAriAGAGGGTGGGGTGGArTGT-3' and 
5'-CCACCTAAArCAACCTCCAACCA-3'; 
5'<XK:ACCCrGCX3GCCAaA-3' and 
5'-AOTGGCCG AaCG GCC OG-3 '; 
5'-GCGTrCGTAi 1 1 lOOGGTr-3' and 
5"-CGTACAArAACCGAACGACCGA-3'; 
5'-TGTGArGTGTITGTAITrrGTGGTr.3" and 
5'-CC AIAC AATAACC AAAC AACC AA-3" ; 
5'-TGGAGGAl'l'l'rri lGCGTACGC-3' and 
5-GAACCGAACGCCGCGAA-3'; 
5'-GTIGGAGGAi'ri'l'l'l lGTGTArGT-3' and 



26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 



45 

region of tumor suppressor genes, including pl6, plS. VHL 
and E-cadherin, are typically the sequence an^^lified by PGR 
in die method of the invention. 

Detection and identification of methylated CpG- 
50 containing nucleic acid in the specimen may be indicative of 
a cell proliferative disorder or neoplasia. Such disorders 
include but are not limited to low grade astrocytoma, ana- 
plastic astrocytoma, glioblastoma^ medulloblastoma. colon 
cancer, lung cancer, renal cancer, leukemia, breast cancer, 
prostate cancer, endometrial cancer and neuroUastoma. 
Identification methylated CpG status is also useful for 
detection and diagnosis of genomic imprinting, fragile X 
syndrome and X-chromosome inactivation. 

The method of the invention now provides the basis for a 
kit useful for the detection of a methylated CpG-containing 
nucleic acid. The kit includes a carrier means being com- 
partmentalized to receive in close confinement therein one 
or more containers. For example, a first container contains a 
reagent which modifies unmethylated cytosine, such as 
sodium bisulfite. A second container contains primers for 
65 amplification of the CpG-containing nucleic acid, for 
example, primers listed above for pi 6, pI5, VHL or 
E-cadherin. 



55 
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The above disclosure generally describes the present 
invention. A more conaplete understanding can be obtained 
by reference to the following specific examples which are 
provided herein for purposes of illustration only and are not 
intended to limit the scope of the invention. s 

EXAMPIJB 1 

DNA and Cell Lines. Genomic DNA was obtained from 
cell lines ^ prunary tumors and noimBl tissue as described 
(Merlo, et al.. Nature Medicine, 1:686, 1995; Herman, ct al., 
Cancer Research. 56:722, 1996; Graff, et al.. Cancer 
Research. 55:5 195. 1995). The renal carcinoma cell Une was 
kindly provided by Dr. Michael Lehrman of the National 
Cancer Institute, Bethesda, MD. 

Bisulfite Modification. 1 \x% of DNA in a volume of 50 pi 
was denatured by NaOH (final 0,2M) for 10 minutes at 37° 
C. For samples with nanogram quantities of human DNA, 1 
^g of salmon sperm DNA (Sigma) was added as carrier prior 
tomodification. 30pLof 10mMhydroquinone(Sigma)and 
520 of 3M sodium bisulfite (Sigma) pH5, both freshly 
prepared, wore added, mixed, and samples were incubated 
under mineral oil at 50° C. for 16 hours. Modified DNA was 
purified using the Wizard™ DNA purification resin accord- 
ing to the manufacturer (Promega), and eluted into 50 jjL of 25 
water. Modification was completed by NaOH (final 0.3M) 
treatment for 5 minutes at room temperature, followed by 
ethanol precipitation. 

Genomic Sequencing. Genomic sequencing of bisulfite 
modified DNA was accomplished using the solid-phase 30 
DNA sequencing approach (Myohanen, et al.. DNA Seq,, 
5:1, 1994). 100 ng of bisulfite modified DNA was amplified 
with pl6 gene specific primer 
5'-TrTrTTAGAGGAITrGAGGGArAGG-3' (sense) (SEQ 
ID NO:49) and 5'-CrACCrAArTCCAArrCCCCrACA-3* 35 
(anti-sense) (SEQ ID NO:50). PCR conditions were as 
foUows: 96'' C. for 3 minutes. SO'' C. for 3 minutes, 1 U of 
Taq polymerase (BRL) was added, followed by 35 cycles of 
96"^ C. for 20 seconds, 56"^ C. for 20 seconds. 72° C. for 90 
seconds, followed by 5 minutes at 72** C. The PCR mixture 40 
contained IX buffer (BRL) with 1.5 mM MgCl2. 20 pmols 
of each primer and 0.2 mM dNTPs. To obtain products for 
sequencing, a second round of PCR was performed with 5 
pmols of nested primers. In this reaction, the sense primer, 
5'-GTTTTCCCAGTCACGACAGTATTAGGAGG AAG 45 
AAAGAGGAG-3' (SEQ ID NO:51), contains M13^0 
sequence (underlined) introduced as a site to initiate 
sequencing, and the anti-sense primer 
5-TCCAArTCCCCrACAAACTrC-3" (SEQ ID NO:52) is 
biotinylated to facilitate purification of the product prior to so 
sequencing. PCR was performed as above, for 32 cycles 
with 2.5 mM MgClj. All primers for genomic sequencing 
weare designed to avoid any CpGs in the sequence. Biotiny- 
lated PCR products were purified using streptavidin coated 
magnetic beads (Dynal AB. Norway), and sequencing reac- 55 
tions performed with Sequenase™ and M 13-40 sequencing 
primer under conditions specified by the manufacturer 
(USB). 

PCR AmpUficadon. Primer pairs described in Table 1 
were purchased fi-om Life Technologies. The PCR mixture 60 
contained IX PCR buffer (16.6 mM anmxonium sulfate, 67 
mM TRIS pH 8.8, 6.7 mM MgCla, and 10 mM 
^mercaptoetibianol), dNTPs (each at 1.25 mM), primers 
(300 ng^reaction each), and bisuMte-modified DNA (-50 
ng) or unmodified DNA (50-100 ng) in a final volume of 50 6S 
jiL. PCR specific for unmodified DNA also included 5% 
dimethylsulfoxide. Reactions were hot started at 95** C. for 



5 minutes prior to the addition of 1.25 units of Taq poly- 
merase (BRL). Amplification was carried out on a Hybaid 
QnmiGene tenq>erature cycler for 35 cycles (30 seconds at 
95° C, 30 seconds at the annealing temperature Usted in 
Table 1, and 30 seconds at 72° C), followed by a final 4 
minute extension at 72° C. Controls without DNA were 
performed for each set of PCR reactions. 10 of each PCR 
reaction was directiy loaded onto non-denaturing 6-8% 
polyacrylamide gels, stained with ethidium t»'omide, and 
directly visualized under UV illumination. 

Restriction Analysis. 10 of the 50 PCR reaction was 
digested with 10 units of BstUI (New England Biolabs) for 
4 hours according to conditions specified by the manufac- 
turer. Restriction digests were ethanol precipitated prior to 
gel analysis. 

EXAMPLE 2 

An initial study was required to validate tiie strategy for 
MSP for providing assessment of the methylation status of 
CpG islands. The pl6 tumor suppressor (Merlo, et al.. supra; 
Herman, et al.. Cancer Research. 55:4525, 1995; Gonzalez- 
Zulueta, et al.. Cancer Res.. 55:4531, 1995,27) which has 
been documented to have hypermethylation of a 5' CpG 
island is associated with complete loss of gene expression in 
many cancer types, was used as an exemplary gene to 
determine v^ether the density of methylation, in key r^ons 
to be tested, was great enough to facilitate the prima: design 
disclosed herein. Other than for CpG sites located in rec- 
ognition sequences for methylation- sensitive enzymes, the 
density of methylation and its correlation to transcriptional 
silencing had not yet been established. The genomic 
sequencing technique was therefore employed to explore 
this relationship. 

FIG. 1 shows genomic sequencing of pl6. The sequence 
shown has the most 5* region at the bottom of the gel, 
beginning at +175 in relation to a major transoiptional start 
site (Hara, et al., MoL Cell BioL, 16:859, 1996). AU 
cytosines in the unmethylated cell line H249 have been 
converted to thymidine, while all C*s in CpG dinudeotides 
in the methylated cell H157 remains as C, indicating methy- 
lation. ] enclosed a BstUI site which is at -59 in relation to 
the transnational start site in Genbank sequence U12818 
(Hussussian, et al., Nat. Genet,, 8:15, 1994), but which is 
incorrectiy identified as CGCA in sequence X94154 (Hara, 
et al., supra). This CGCG site represents the 3* location of 
the sense primer used for pl6 MSP. 

As has been found for other CpG islands examined in this 
manner (Myohanen, et aL, supra; Park, et al., MoL Cell Biol.. 
14:7975, 1994; Reeben, ct al.. Gene. 157:325, 1995), the 
CpG island of pl6 was con^letely unmethylated in those 
cell lines and normal tissues previously found to be unm- 
ethylated by Southern analysis (FIG, IXMerio, et al., supra; 
Herman, et al, supra). However, it was extensively methy- 
lated in cancer cell lines shown to be methylated by South- 
ern analysis (FIG. 1). In fact, all cytosines within CpG 
dinucloetides in this region were completely methylated in 
the cancers lacking pl6 transcription. This marked differ- 
ence in sequence following bisulfite treatment suggested that 
the method of the invention fca: specific ampHiication of 
either methylated or unmethylated alleles was usefuU for 
identification of methylation patterns in a DNA sample. 

Primers were designed to discriminate between methy- 
lated and unmediylated alleles following bisulfite treatment, 
and to discriminate between DNA modified by tttsulfite and 
that which had not been modified. To accomplish this, 
primer sequences were chosen for regions containing fre- 
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quent cytosines (to distingiush unmodified from modified 
DNA), and CpG pairs near the 3' end of the primers (to 
provide maximal discrimination in the PGR reaction 
between methylated and unmethylated DNA). Since the two 
strands of DNA are no longer complementary after bisulfite 5 
treatment primers can be designed for either modified 
strand. For convenience, primers were designed for the 
sense strand. The fragment of DNA to be amplified was 
intentionally smaU, to allow the assessment of methylation 
patterns in a limited region and to facilitate the application 
of this technique to samples, such as paraffin blocks, where 
amplification of larger fragments is not possible. In Table 1, 
pffimer sequences are shown for all genes tested, emphasiz- 
ing the differences in sequence between the three types of 
DNA which are exploited for the specificity of MSP. The 
multiple mismatches in these primers which are specific for *5 
these different types of DNA suggest that each primer set 
should provide amplification only from the intended tem- 
plate. 

The primers designed for pl6 were tested with DNA from 
cancer ceU lines and normal tissues for which the methyla- 20 
tion status had previously been defined by Southern analysis 
(Merlo, et al., supra; Herman, et al., supra). 

FIG. 2, panels A-D, show polyacrylamide gels with the 
Methylation Specific PGR products of pl6. Primer sets used 
for amplification are designated as unmethylated (U), 25 
methylated (M), or unmodified/wUd-type (W).* designates 
the molecular weight marker pBR322-MspI digest Panel A 
shows amplification of bisulfite-treated DNA from cancer 
cell lines and normal lymphocytes, and untreated DNA 
(from cell Hne H249). Panel B shows nuxing of various 30 
amount of H157 DNA with 1 ^g of H249 DNA prior to 
bisulfite treatment to assess the detection sensitivity of MSP 
for metihiylated alleles. Modified DNA fi-om a primary lung 
cancer sample and normal lung are also shown. Panel C 
shows amplification with the pl6-U2 (U) primers, and 
pl6-M2 (M) described in Table 1. Panel D shows the 
amplified pl6 products of panel C restricted with BstUI(+) 
or not restricted (-). In all cases, the primer set used 
confirmed the me^ylation status determined by Southern 
analysis. For exanq>le, lung cancer cell lines U1752 and 
H157, as well other cell Uncs methylated at pl6, amplified 40 
only with the methylated primers (FIG. 2» panel A). DNA 
from normal tissues (lyn^ocytes, lung, kidney, t»reast and 
colon) and the unmediylated lung cancer cell lines H209 and 
H249, anopUfied only with unmethylated primers (examples 
in FIG. 2, panel A). PGR with these primers could be 45 
performed with or without 5% DMSO. DNA not treated with 
bisulfite (unmodified) faOed to sunphfy with either oi 
methylated or unmethylated specific primers, but readily 
amplified with primers specific for die sequence prior to 
modification (FIG, 2, panel A). DNA from the cell line H157 50 
after bisulfite treatment also produced a weaker amplifica- 
tion with unmodified primers, suggesting an incoir^lete 
bisulfite reaction. However, this unmodified DNA, unlike 
partially restricted DNA in previous PGR assays relying on 
methylation sensitive restriction enzymes, is not recognized 55 
by the primers specific for methylated DNA. It therefore 
does not provide a false positive result or interfere with the 
ability to distinguish methylated firom unmethylated alleles. 

The sensitivity of MSP for detection of methylated pl6 
alleles was assessed. DNA from methylated cell lines was 60 
mixed with unmethylated DNA prior to bisulfite treatment 
0A% of riethylated DNA (approximately 50 pg) was con- 
sistentiy detected in an otherwise unmethylated sample 
(FIG. 2, panel B). The sensitivity limit for the amount of 
input DNA was determined to be as littie as 1 ng of human 65 
DNA, mixed with salmon sperm DNA as a carrier detectable 
by MSP. 



Fresh human tumor samples often contain normal and 
tumor tissue, making the detection of changes specific for 
the tumor difficult. However, the sensitivity of MSP suggests 
it would be useful for primary timiors as well, allowing for 
detection of aberrantiy metiiylated aUeles even if they con- 
tribute relatively tittle to the ova-all DNA in a sample. In 
each case, while normal tissues were completely 
unmethylated, tumors d^ermined to be methylated at p 16 by 
Southern analysis also contained methylated DNA detected 
by MSP, in addition to some unmethylated alleles (exan^iles 
in FIG. 2, panel B). DNA firom paraffin-embedded tumors 
was also used, and allowed the detection of methylated and 
unmethylated alleles in these san^les (FIG. 2, panel B). To 
confirm that these results were not unique to this primer set, 
a second downstream primer for pl6 was used which would 
amplify a slightiy larger fragment (Table 1). This second set 
of primers reproduced the results described above (FIG. 2, 
panel C), confirming the methylation status defined by 
Southern blot analysis. 

To further verify the specificity of the primers for the 
methylated alleles and to check specific cytosines for methy- 
lation within the region amplified, the differences in 
sequence between methylated/modified DNA and 
unmethylated/modified DNA were utilized. Specifically, the 
BstUI recognition site, CGCG, will remain CGCG if both 
C's are methylated after bisulfite treatment and 
amplification, but will become TGTG if unmethylated. 
Digestion of the amplified products with BstUI distinguishes 
these two products. Restriction of pl6 amplified products 
ilLustrates this. Only unmodified products and methylated/ 
modified products, both of whidi retain the CGCG site* were 
cleaved by BstUI, while products amplified with 
unmethylated/modified primers failed to be cleaved (FIG. 2, 
panel D). 

The primer sets discussed above were designed to dis- 
criminate heavily methylated CpG islands firom unmethy- 
lated alleles. To do this, both the uppex (sense) and lower 
(antisense) primers contained CpG sites which could pro- 
duce methylation -dependent sequence differences after 
bisulfite treatment, MSP might be employed to examine 
more regional aspects of CpG island methylation. To exam- 
ine this, methylation-dependent dififerences in the sequence 
of just one primer was tested to determine whether it would 
stUl allow discrimination between unmethylated and m^y- 
latedp pl6 alleles. The antisense primer used for genomic 
sequencing, 5'-CTACCTAArTCCAATTCCCCTACA-3* 
(SEQ ID NO:53), was also used as the antisense primer, 
since the region recognized by the primer contains no CpG 
sites, and was paired with either a methylated or unmethy- 
lated sense primer (Table 1). Amplification of the 313 bp 
PGR product only occurred with the unmetiiylated sense 
primer in H209 and H249 (unmethylated by Southern) and 
the methylated sense primer in HI 57 and U1752 
(methylated by Southern), indicating that methylation of 
CpG sites within a defined region can be recognized by 
specific primers and distinguish between methylated and 
unmethylated alleles (FIG. 2, panel E). Panel E shows 
results of testing for r^onal methylation of CpG islands 
with MSP, using sense primers pl6-U2 (U) and pl6-M2 
(M), which are methylation specific, and an antisense primer 
which is not methylation specific. 

EXAMPUBS 

The above experiments with pl6 were extended to include 
3 other genes transcriptionally silenced in human cancers by 
aberrant hypermethylation of 5* CpG islands. 

FIG. 3, panels A-E, show polyacrylamide gels of MSP 
products fix>m analysis of several genes. Primer sets used for 
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an^lification are not designated as unmethylated (U), 
methylated (M), or unmodified^ wild- type (W). * designates 
the molecular weight marker pBR322-MspI digest and ** 
designates the 123 bp molecular weig^it marker. All DNA 
samites were bisulfite treated except those designated 5 
untreated. Panel A shows the results from MSP for pl5. 
Panel B shows the plS products restricted with BstUI (+) or 
not restricted (-). Panel C shows the products of MSP for 
VHL. Panel D shows the VHL products restricted witii 
BstUI(+) or not restricted (-). Panel E shows the products of 
MSP for E-cadherin. 

The cyclin-dependent kinase inhibitor pl5 is aberrantly 
methylated in many leukemic ceU lines and primary leuke- 
mias (Herman, et al„ supra). For pl5. MSP again voiftedthe 
methylation status determined by Southern analysis. Thus, 
normal lymphocytes and cancer ceU lines SW48 and U1752, 
all unmethylated by Southern analysis (Herman, et al, 
supra), only amplified with the unmethylated set of primers, 
while the lung cancer cell line H1618 and leukemia cell line 
KGIA amplified only with the methylated set of primers 
(FIG. 3, panel A), consistent with previous Southern analy- 20 
sis results (Herman, et al., supra). The cell line Raji pro- 
duced a strong PCR product with methylated primers and a 
weaker band with unmethylated primers. This was the same 
result for methylation obtained previously by Southern 
analysis (Herman, et al., supra). Non-cultured leukemia 25 
samples, like the prLmary tumors studied for pi 6, had 
amplification with tfie metihylated primer set as well as the 
unmethylated set. This heterogeneity also matched Southern 
analysis (Ha-man, et al., supra). Again, as for pl6, differ- 
ential modification of BstUI restriction sites in flie amplified 3^ 
product of pl5 was used to verify the specific an^)Jification 
by MSP (FIG. 3, panel B). An^lified products using methy- 
lated primer sets from cell lines H1618 and Raji or unmodi- 
fied primer sets, were completely cleaved by BstUI, while 
unmethylated aR^)]ified products did not cleave. Primary 
AML san^^les, which again only demonstrated cleavage in 



Aberrant CpG island promoter region methylation is 
associated with inactivation of the VHL tumor suppressor 
gene in approximately 20% of clear renal carcinomas 
(Herman, e* al., Proc. Nad. Acad, Sci. USA. 91:9700, 1994). 
This event, like mutations for VHL (Gnarra, et al.. Nature 
Genetics, 7:85, 1994), is restricted to clear renal cancers 
(Herman, et al, supra). Primers designed for the VHL 
sequence were used to smdy DNA from the renal ceU cancer 
line RFX393 which is methylated at VHL by Southern 
analysis, and the lung cancer cell Une U1752 which is 
unmethylated at this locus (Herman, et al., supra). In each 
case, the methylation status of VHL determined by MSP 
confirmed that found by Southern analysis (FIG. 3, panel C), 
and BstUI restriction site analysis validated the PCR product 
specificity (FIG. 3, panel D), 

The expression of the invasion/metastasis suppressor 
gene, E-cadherin, is often silenced by aberrant methylation 
of the 5' promoter in breast, prostate, and many other 
carcinomas (Graff, et al., supra; Yoshira, et al., Proc. Natl. 
Acad, ScL USA. 92:7416, 1995). Primers were designed for 
the E-cadherin promoter region to test the use of MSP for 
this gene. In each case, MSP analysis paralleled Southern 
blot analysis for the methylation status of the gene (Graff, et 
al., supra). The breast cancer cell lines MDA-MB-231, 
HS578t, and the prostate cancer cell lines DuPro and 
TSUPrl, all heavily methylated by Southern, displayed 
prominent mc^ylation. MCF7,T47D, PC-3. and LNCaP, all 
unmethylated by Southern, showed no evidence for methy- 
lation in the sensitive MSP assay (FIG. 3, panel E). MSP 
analysis revealed the presence of unmethylated alleles in 
Hs578t, TSUPrl and DuPro consistent with a low percentage 
of unmethylated alleles in these ceU lines previously 
detected by Southern analysis (Graff, el al., supra). BstUI 
restriction analysis again confirmed the specificity of the 
PCR an^lification. 



TABLE 1 



PCR nrimCTS used for Metfavlation Specific PCR 

Primer Size Anneal Genomic 

Set Sense primer* (5-3") Antificase primer* (5-3') (bp) temp. Pcwitiont 



pl6-Wt CAGAGGGTGGGGCGACCGC 

pl6-M TTATTAGAGGGlXXXKJgQGATCGC 

pl6-U TTATTAGAGGGIGGGOTGOATTOT 

pl6-M2 TrATTAGAGGGTGGGGOOGATCGC 

pl6-U2 TTATTAGAGGKjTGGGGTGGATrGT 

pl5-W CGCACCCTGCGGCCAGA 

pl5-M GCCrnrgOTATTTTOCGGTT 

pl5-U TOTOAIxSOTITGTATlTro 

VHL-M TOGAGGAllTiri'lGCGTAOGC 

VHL-U GTTGGAGGAl'l'rilTltSipTArGT 

Ecad-M TTAGGTTAGAGGGTTATCGCGT 

Ecad-U TAATTTTAGGTX\GAGGGTTATTGT 



CGGGCCGCGGCCGrrOG 


140 


65" 


C. 


+171 


GACCCCGAACCGCGACCGTAA 


150 


65'' 


C. 


+167 


CAACCCCAAACCACAACCATAA 


151 


60P 


c. 


+167 


CCACCTAAATCGACCTCCGACCG 


234 


65" 


c. 


+167 


CCACCTAAATCAACCTCCAACCA 


234 


60" 


C. 


+167 


AGTGGCCGAGCGGCCOG 


137 


65" 


C. 


+46 


CX3TACAATAACCGAACGACCGA 


IAS 


60"* 


C. 


-HK) 


CCArACAATAACCAAACAACCAA 


154 


60" 


C. 


+34 


GAACCGAACGCCGOGAA 


158 


60" 


C. 


-116 


cxx:aaaccaaacaccacaaa 


165 


60" 


C. 


-118 


TAACTAAAAATTCACCTACCGAC 


116 


57" 


C. 


-205 


CACAACCAATCAACAACACA 


97 


53" 


c. 


-210 



■"Sequence di£feimces between modified primers aiid unmodified DNA are boldface, and di£Eerences between metbylated/ 
modified acid unmetfaylated/modified are underlined. 

fPrimers were placed near die tFanscripdonal strat site. Genomic position is the location of the 5' nucleotide of the sense primer 
in lelatton to the major transcriptional start site defined in the foUowii^g references and Genbank accession numbers: pl6(nx>st 
y site) X94154 (E. Hara, et al., MoL Cell BioL, 16:859, 1996), pl5 S75756 (J. Jen, et al.. Cancer Res., 54: 6353 1994), VHL 
V19763 (L KuzmiD, et aC Oncogene, 10:2185 1995), and E-cadherin 34545 (M. J. Bussemakers, et al., Biochem. Biophys. Res. 
Conumm^ 203: 1284 1994). ^ ^ ^ 

tW represesits umxxxfified, or wild-type primers, M lepiesents mediylated-specific primers, and U represents immethyialed- 
specific primers. (SEQ ID NO: 26^) 



the methylated product, had less complete cleavage. This 
suggests a heterogeneity in methylatioa^ arising because in 
some alleles^ many CpG sites within the primer sequences 
area are methylated enough to allow the melhylation specific 
primers to amplify this region, while other CpG sites are not 
completely me^ylated. 



Although the invention has been described with reference 
to the presently preferred embodiments, it should be under- 
^5 stood that various modiiicataons can be made without 
departing from the spirit of the invention. Accordingly, die 
invention is limited only by the following claims. 
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SEQUENCE USUNG 



( 1 ) GENERAL INFORM>aiON: 

( i i i > NUMBER <»7 SEQUENCES: 52 

( 2 ) INFORMATION FCni^Qm NO: 1: 

( i ) SEQUENCE CHARACTERISTICS: 
( A > LENGTH: 20 base pain 
( B ) TYPE: nuddc acid 
( C ) SnRAKDEIX^ESS: ntgle 
( D ) TOPOLOGY: linear 

< i i ) MOLECULE TYPE: DN A 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

GCGGTCCOCC CCACCCTCTG 

( 2 ) INPORMAnON FOR SEQ ID NO:2: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 16 base pun 
( B ) TYPE: nucleic and 
( C ) STRANDEDNESS: saDgle 
< D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: EMA 

( x i > SEQUENCE INSCRIPTION: SEQ ID NO:2: 

CCACOGCCOC GGCCCO 



< 2 > INFORMiOlON FOR SEQ ID NO:3: 

( X ) SEQUENCE CHARACTBRISnCS: 
( A ) LENGTH: 24 Imsc pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: sin«^ 
( D ) TCffOLOGY: Uncar 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPTK»*: SEQ ID VK>3i 

OCOATCCOCC CCACCCTCTA ATAA 

( 2 ) INFORMATION FOR SEQ ID NO:4: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 21 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D )TOPOIjOGY: liftw 

( i i ) MOLECULE TYPE: I»4A 

( X t ) SEQUENCE DESCRIPTION: SEQ ID N054: 

TTACGGTCOC GOTTCGOGGT c 



( 2 ) INFORMATION FC»t SEQ ID NO-Ji: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 24 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANEffiDNESS: 
( D ) TOPOLOGY: 



( i i ) MCX.ECULE TYPE: ra^A 



( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:5: 
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ACAATCCACC CCACCCTCTA ATAA 

( 2 ) INFORMATION FOR SBQ ID NO:6: 

< i ) SEQUENCE CHARACrEMSnCS: 
( A ) LENOTH: 22 base pun 
( B ) TYPE: nuicldkc acid 
( C ) STRANDEDNKSS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA 

( X 1 > SEQUENCE DESCRIPTION: SBQ ID NO:6: 

TTATOGTTOT OOTTTOOOOT TO 

( 2 ) INFORMAnC»< FOR SBQ ID NOt7: 

( 1 > SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 24 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEC»{ES5: single 
{ D )TOPOIjOOY: liBcac 

( i i ) MOLECULE TYPE: DNA 

< X i ) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

GCOATCCOCC CCACCCTCTA ATAA 

( 2 ) INFORMAnON FOR SBQ ID NOrS: 

( i ) SEQUENCE CHARACTERISTICS: 
( A > LENOTH: 23 baae pairs 
( B ) TYPE: ondeic acid 
( C ) STRANCKDNBSS: nngle 
< D ) TOPCXXlOY: hMsm 

( i i ) MOLECULE TYPE: DNA 

( X i > SEQUENCE ISSCRIPIION: SEQ ID NO:S: 

COOTCOGAOO TCOATTTAOO TOO 

( 2 > INFCffiMAncm FC«t SEQ ID NO:9: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 24 haae pairs 
( B ) TYPE: nickic Jttid 
( C ) STRANDEra^nSSS: sinefe 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:9: 

ACAATCCACC CCACCCTCTA ATAA 



( 2 ) INFORMAnON FOR SEQ ID NO:10: 

( i > SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 23baaepa» 
( B ) TYPE: anckic acid 
( C ) SIRANTSDNESS: riagle 
( D ) TOPOLOGY: ] 



( i i ) M(X£CULE TYPE: mA 
( X i ) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
TOGTTOGAGG TTOATTTAGG TGG 



< 2 ) INPORMAnON FOR SEQ n> NO:ll: 
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( t ) SEQUENCE CHARACTERISTICS: 
( A ) LHN01H: 17 base pain 
( B ) TYPE: aucleac acid 
( C ) STRANDEDNESS: amgle 
( D ) TOPOLOOYt fiiKar 

( i i ) MCX-BCULE TYPE: DNA 

( X 1 ) SEQUENCE DESCRIPnON: SEQ ID NO: 11: 

TCTGGCCGCA OOGTGCG 



( 2 ) INFORMATION FOR SEQ ID NO:]2: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 17 base pairs 
( B ) TYPE: nucleic acid 
< C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linBar 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

CCOOCCOCTC OOCCACT 



( 2 >INPORBk«AnON FOR SEQIDNO:13: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 19 haac pmn 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: dngle 
( D ) TOPOLOGY: line* 

( i i ) MOLECULE TYPE: DNA 

( X t ) SEQUENCE DESCRIPnm: SEQ ID NO: 13: 

AACCGCAAAA TACOAACGC 



( 2 ) INFORMATION FOR SEQ ID NO: 14: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 22 base pain 
( B ) TYPE: fludeac mad 
( C ) STRANDEDNESS: amgfe 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPnm: SEQ ID NO:14: 

TCGGTCOTTC OOTTATTGTA CO 



( 2 >INFORMAnC»^ FOR SEQIDNO:15: 

( i ) ^QUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pain 
( B ) TYPE: nucleic acAd 
( C ) STRANEkEDNESS: single 
( D ) TOPOLOGY: fiMar 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIFnON: SEQ ID NO: 15: 

AACCACAAAA TACAAACACA TCACA 



( 2 >INPORMAnONFORSBQIDNO:16: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 23 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEEmSSS: vnf^ 
( D ) rCfPOLOOYz laacm 
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( i i ) MOLECULE TYPE: mA 
( X i ) SEQUENCE ISSCRIFnON: SEQID NO:16: 
TTOGTTOTTT OGTTATTOTA TOG 23 

( 2 >INPC»MAnCX4PORSEQIDNO:17: 

( i ) SEQUENCE CHARACTERISIKS: 
( A ) LENOTEi: 22 baae pun 
( B ) TYPE: nucleic aad 

< C ) STRANDEDNESS: single 
( D ) TOPOLOGY: tinear 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE IffiSCIUPTION: SBQID NO:17: 

OCOTACOCAA AAAAATCCTC CA 22 

( 2 ) INPORMAnON FOR SEQ ID NO:18: 

( i ) SEQUENCE CHARACTERISTICS: 

< A > LENGTH: 17 base pairs 

< B > TYPE: nucleic acid 

< C > STRANDEEH4ESS: single 

< D ) TOPOLOGY: linear 

< i i ) MOLECULE TYPE: DNA 
( X i ) SEQUENCE DESCRIPnCWI: SEQ ID NO:18: 
TTCGCGGCOT TCOOTTC 17 

( 2 > INFC»MAn<»4 FOR SEQ ID NO:19: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 24 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: siagie 
( D ) TOPOLOGY: tuear 

( i i > MCXJBCULE TYPE: £84 A 

( X i ) SEQUENCE DESCRIPTION: ^Q ID NO:19: 

ACATACACAA AAAAATCCTC CAAC 24 

( 2 ) INPCM(MAnON FOR SEQ ID NO:30:: 

< i ) SEQUENCE CHARACTERISTICS: 
( A ) I.ENGTH: 20 base pairs 
( B ) TYPE: flndeic acid 
( C ) SIRANDGDNESS: smffh 
( D ) TOPOLOGY: Imear 

( i i ) MOLECULE TYPE: DNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

TTTGTGOTOT TTOOTTTGOO 20 

( 2 ) INFORMAnON FOR SEQ ID NO:21 : 

( i ) SEQUENCE CHARACTBRISTICS: 
( A ) LENGTH: 22 taue pain 
( B ) TYPE: fludeic mad 
( C > STRANDEDSNfESS: single 
( D > TOPOLOGY: linear 

( i i ) MOLECULE TYPE: EMNA 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
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ACOCOATAAC CCTCTAACCT AA 

( 2 )IN[ORMAnONFORSBQiDNO-^: 

< i ) SEQUENCE CHARACTERISnCS: 
( A ) UBNCIH: 23 base pun 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: ra^A 

( X i ) SEQUENCE DESCRIFIION: SEQ ID NO:22: 

OTCOOTAOOT OAATTTTTAO TTA 

( 2 ) INPCHtMAnON PC»( SEQ ID NO:23: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D > TOPOLOGY: linear 

( i i ) MCft-BCULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPHON: SEQ ID NO:23: 

ACAATAACCC TCTAACCTAA AATTA 

< 2 )INPORMAnONFORSEQIDNO*^: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 20 base paira 

< B > TYPE: nudeic acid 

< C ) STRANESDNESS: amgle 

< D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA 
< X i ) SEQUENCE DESCRIPnON: SEQ n> NO:24: 
TOTOTTGTTG ATTOGTTOTO 

( 2 yiNPORMAnON FOR SBQID NO:25: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 20 base pairs 
( B ) TYPE: micleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPHON: SEQ ID NO:25: 

CAGAGGOTGG GGCGGACCOC 



( 2 ) INFORMAnON FOR SBQ ID NO:26: 

( i > SEQUENCE CHARACTERISTICS: 
( A > LENGTH: 16 b«e pain 
( B > TYPE: nucleic add 
( C ) SIRANDEDNESS: single 
( D ) TC»>OLOOY: lia 



( i i > MOLECULE TYPE: DNA 
( X i > SEQUENCE DESCRIPTION: SEQ ID NO-^: 
CGGOCCOCOG CCOTGO 



( 2 ) INFC»MAnON FOR SEQ ID NO:27: 
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( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 24 base paira 
( B ) TYPE: nucleic acid 
( C ) SlIlANEffiDNESS: nneilB 
( D ) TOFCAjOOY: Hnear 

( i i ) MOLECUUE TYPE: EWA 

< X i ) SEQUENCE EffiSCRIPTlON: SBQ TO NO:27: 

TTATTAOAOO OTOGOGCGOA TCOC 



< 2 ) INPCHKMAnON FOR SEQ ID NO:28: 

( i ) SEQUENCE CHARACTERISTICS: 
( A > LENGTH: 21 baac pairs 
( B ) TYPE: XMclBic aad 
( C ) SIRANDEDNBSS: nngle 
( D ) TOPOLOGY: fim 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIFnON: SEQ ID NO:28: 

GACCCCGAAC COCOACCOTA A 



( 2 ) INPORMAnW FOR SEQ ID NO:29: 

( i ) SEQUENCE CHARACTERISnCS: 
( A ) LENOm: 24- baw pain 
( B ) TYPE: mckK acid 
< C ) STRANDBDN13SS: angle 
( D ) TOPOLOGY: Hnear 

< i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DBSCRIPTION: SEQ ID NO:29: 

TTATTAGAGG OTOOGOTOGA TTGT 



( 2 ) INFORMATION PCXt SEQ ID NO£30: 

( i ) SEQUENCE CHARACrERISnCS: 
( A ) LENGTH: 22 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANEffiDNESS: smglB 
( D ) TOPOLOGY: Baear 

( i i ) MOLECULE TYPE: DNA 

< X i ) SEQUENCE DESCRIPTION: SEQ ID NO30: 

CAACCCCAAA CCACAACCAT AA 



( 2 )INPC«M>aiaNFORSBQIDNO:31: 

( i > SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 24 Imsc pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: sinslc 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPnON: SBQ ID N031: 

TTATTAGAGG OTGGOOTOGA TTGT 



( 2 ) INPORMAnON FOR SEQ ID NO:32: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 23 base pain 
( B ) TYPE: nucleic aad 
( C ) STRANDEDNESS: mg^ 
i D ) TOPCXjOGY: Hnear 
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( i i ) MOLECULE TYPE: DNA 
( X i ) SBQUENCB DBSCRIPnON; SEQ ID NO-32: 
CCACCTAAAT COACCTCCGA CCO 23 

{ 2 )lNPCMRMAnONF<»SEQIDNO'-33: 

( 1 ) SEQUENCE CHARACTERISnCS: 
( A >LENCnH: 24 base pairs 
( B ) TYPE: inclac acid 
( C ) STRANDEIM^ESS: siogle 
( D > TOPOLOGY: linear 

( i i > MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRlFTEOaSf: SEQ ID NO*33: 

TTATTAOAGG OTGGGOTOGA TTGT 24 

( 2 ) INFORMAnON FOR SEQ ID NO^J4: 

( i ) SEQUENCE CHARACIERISrnCS: 
{ A )UBNOTEI: 23faasepMn 
{ B ) TYPE: BDcleic aai 
( C ) STRANDEDNESS: single 

< D ) TOPOLOGY: Baear 

( i i ) MOLECULE TYPE: DNA 
( X i ) SEQUENCE I^SCRIFHON: SEQ ID NO:34: 
CCACCTAAAT CAACCTCCAA CCA 23 

( 2 )INPORMAnONFCRSBQIDNOi35: 

< i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 17 base pairs 

( B ) TYPE: mucl<nc aad 

( C ) STRANDEDNESS: sia^ 

( D > TOPOLOGY: linear 

( i i ) MM-BCULE TYPE: DNA 

( X i ) SEQUENCE INSCRIPTION: SEQ ID NO:35: 

COCACCCTOC GOCCAOA 1 7 

( 2 ) INPOElMAnON FOR SEQ ID NO:36; 

( i ) SEQUENCE CHARACTERISTICS: 
( A > LENGTH: 17 base pairs 
( B > TYPE: Qucleac acid 
( C > STRANDEDNESS: aiilgle 
( D ) TOPOLOGY: tineac 

< i i ) MOLECULE TYPE: DNA 

( X J ) SEQUENCE DESCRIPTION: SEQ ID NO-^: 

AGTOGCCGAG COOCCOQ 17 

< 2 ) INPORMAnON FOR SEQ ED NO:37: 

( i ) SEQUENCE CHARACTERISTICS: 

< A ) LENGTH: 19 base paira 

< B ) TYPE: nucleLc acid 

( C ) SIRANDEIMNESS: single 

< D ) TCVOLOO Y: Baeac 

( i i ) EwfOLECULE TYPE: DNA 

( X i ) SEQUENCE DBSCRIPTIW: SBQ ID NO£37: 
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GCGTTCOTAT TTTOCGOTT 



( 2 ) INFORMAIKm FOR SBQ ID 

< 1 ) SEQUENCE CHARACrERISllCS: 
( A ) LENOTH: 22 hase pain 
( B ) TYPE: nucleiic acid 
( C ) STUANDEDNESS: single 
( D ) TOTOLOOY: linear 

( i i > MOLECULE TYPE: DNA 

( X k ) SEQUENCE DBSCRIPTEON: SBQ ID NO:38; 

CGTACAATAA CCOAACGACC GA 



< 2 )INPQRMAnONFORSEQIDNCh39: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPCXjOOY: Unear 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:39: 

TOTOATGTGT TTGTATTTTO TGGTT 



( 2 ) INPORMAnON FOR SEQ ID N0:40: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 23 base pm 
( B ) TYPE: nucleic acid 
( C > STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i k > MCX.BCULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:40: 

CCATACAATA ACCAAACAAC CAA 



( 2 ) INFC^lMAnON FOR SBQ ID NO>41: 

( i ) SEQUENCE CHARACTERISTICS: 

< A ) LENGTH: 22 base pairs 

< B ) TYPE: ancleic acid 

( C ) SntANDEDtWS: nngle 
( D ) TOPOLOGY: linear 

( i i ) MCa^ECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPTION: SBQ ID NO:41: 

TGOAOGATTT TTTTGCGTAC GC 



( 2 ) INPORMAnON FOR SEQ ID NO:42: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENOTH: 17 teae pairs 
( B ) TYPE: aadflic acid 
( C ) STRA3t4DeDNBSS: sin^ 
( D ) TOPOLOGY: Bnear 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPTION: SBQ ID NO?«2: 

OAACCOAACG CCGCGAA 



( 2 )INPORMAnONPORSEQIDNO:43: 
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( i ) SEQUENCE CHARACTERISTTCS: 
( A > LENGTH: 24 base pans 
( B ) TYPE: miclMC Jicid 
( C ) SIRANDEEMESS: single 
( D ) TOPODOOY; Bnpar 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DE5CRIPI1CH4: SEQ ID NO:43: 

GTTOOAOOAT TTTTTTGTOT ATOT 



( 2 ) INPORMAHON FOR SEQ ID NO:44: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 20 base pairs 
< B ) TYPE: tncikac acid 
( C > STRANDBDNESS: smg^ 
( D ) TOPOLOGY: ] 



( i i ) MOLECULE TYPE: DNA 
< X i ) SEQUENCE DESCRIFnON: SEQ ID NO:44: 
CCCAAACCAA ACACCACAAA 



( 2 ) INPORMXnON FOR SEQ ID NO:45: 

( i ) SEQUENCE CHARACTERISnCS: 
( A ) UENOrTO: 22 faiue pairs 
( B ) TYPE: nockk: acid 
( C ) sntANDEDNESS: nagle 
( D ) TCffOLOOY: liaear 

( i i ) MOLBCULE TYPE: DNA 

( X i ) SEQUENCE DESCRIFnON: SEQ ID NO:45: 

TTAGGTTAOA OOOTTATCGC GT 



( 2 ) INPORMATION FOR SBQ ID NO:46: 

( i > SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 23 base pain 
( B ) TYPE: andeic and 
( C ) SIRANDBDNBSS: smele 
( D ) TOPOLOGY: linaar 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE [ffiSCRIFlICX^: SBQID NO>l6: 



( 2 )INPCMlMAnONFCat SEQIDNO:47: 

< i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic add 
( C ) SrmANTffiDNESS: ains^ 
( D ) TOPOLOGY: Bttear 

( 1 i ) I4QLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
TAATTTTAGG TTAGAOOGTT ATTGT 



( 2 ) INPORMAnON FOR SEQ ID NO:48: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 20 base pain 
( B ) TYPE: auclac acid 
( C ) STRANDEDNBSS: single 
( D ) TOPOLOGY: linear 
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( i i > MOLECULE TYPE: I»4A 
( X i ) SBQUENCE DESCRIPI10N: SBQ ED NO:48: 
CACAACCAAT CAACAACACA 



( 2 >INPC>RMAnONFORSEQIDNO:«: 

( i ) SEQUENCE CHARACTERISTICS: 

< A ) LENGIH: 24 base paixs 

< B ) TYPE: aocikic acid 

< C > SrntANDEra^ESS: sbtfi^ 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRlFTim: SBQ ID N0^49: 

TTTTTAOAGG ATTTGAGOOA TAOO 



( 2 ) INPORMAnON FOR SEQ ID NO:50: 

< i ) SEQUENCE CHARACIERISTECS: 
( A ) LENOIH: 24b«e pm 
( B ) TYPE: nuddtc add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: Ii«Bar 

( i i ) MOUBCULE TYPE: DNA 

( X i ) SEQUENCE DBSCRIPITON: SEQ ID NO:50: 

CTACCTAATT CCAATTCCCC TACA 



< 2 )INFCaCMAnONFORSBQIDNO-31: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 41 base pain 
( B ) TYPE: nucleic ackl 
( C ) STRANDEEW^ESS: aof^ 
( D ) TOPOLOGY: linear 

( i i > MOLECULE TYPE: DNA 

( X i > SEQUENCE I^SCRIPnON: SBQ ID NO*^l: 

OTTTTCCCAO TCACOACAOT ATTAGOAGGA ACAAAGAOGA G 



( 2 ) INPORMAHON FOR SEQ ID NO-^2: 

( X > SEQUENCE CHARACIERISnCS: 
( A ) LENOIH: 21 ba»c pairs 
( B ) TYPE: mcleic aad 
( C ) STRANDEDNESS: siagle 
( D ) TOPOUOOYi hmsu 

( i i ) MOLECULE TYPE: DNA 

( X i ) SEQUENCE DESCRIPnCW: SBQ ID NO:52: 



What is claimed is: 

1. A method for detectiag a methylated CpG-containing 
nucleic acid con^iisiiig: 

contacting a nucleic acid-containing i^pecimen with an 
agent that modifies unmethylated cytosine. 

amplifying the CpG-containing nucleic acid in the speci- 
men by means of CpG- specific oligonucleotide 
primers, whearein the oligonucleotide primers distin- 
guish between modified methylated and nonmethylated 
nucleic acid, and 



detecting the methylated nucleic acid based on the pres- 
ence or absence of amplification products produced in 
said amplifying step. 

2. The method of claim 1, wherein the amplifying step is 
the polymerase chain reaction (PGR). 

3. The method of claim 1, wherein the modifying agent is 
bisulfite. 

4. The method of claim 1^ wherein cytosine is modified to 
uracil. 

5. The method of claim 1, wherein the CpG-containing 
nucleic add is in a promoter region. 
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6. The method of claim 5, wherein the promoter is a tumor 
suppressor gene promoter. 

7. The method of claim 6. wherein the tumor suppressor 
gene is selected from tfie group consisting of pi 6, pl5* 
E-cadherin, and VHL. 5 

8. The method of claim 1, wherein the specimen is from 
a tissue selected from the group consisting of brain, colon, 
urogenital, lung, renal, hematopoietic, breast, thymus, testis, 
ovarian, and uterine. 

9. The method of claim 1, further comprising contacting lo 
the nucleic acid with a methylatlon sensitive restriction 
endonudease. 

10. The method of claim 9, wherein the restriction endo- 
nuclease is selected from the group consisting ofMspL 
Hpan. BssHn, BstUI and NotL 15 

11. The method of claim 1, wherein the presence of 
methylated CpG-containing nucleic acid in the specimen is 
indicative of a cell proliferative disorder. 

12. The method of claim 11, wherein the discarder is 
selected from the group consisting of low grade 20 
astrocytoma, anaplastic astrocytoma, glioblastoma, 
medulloblastoma. colon cancer, lung cancer, renal cancer, 
leukemia, breast cancer, prostate cancer, endometrial cancer 
and neuroblastoma. 

13. The method of claim 1 , wherein the primer hybridizes 25 
with a target polynuciec^de sequence having the sequence 
selected from the group consisting of SEQ ID NO: 1-23 and 
SEQ ID NO:24. 

14. The method of claim 1, wherein the fsin^rs are 
selected from the group consisting of SEQ ID NO:25-^7 30 
and SEQ ID NO:48. 

15. A kit useful for the detection of a methylated CpG- 
containing nucleic acid comprising carrier means being 
compartmentalized to receive in close confinement therein 
one or more containers comprising a first container contain- 35 
ing a reagent which modifies unmethylated cytosine and a 
second container containing primers for amplification of the 
CpG-containing nucleic acid, wherein the primers distin- 
guish between modified methylated and nonmetfaylated 
nucleic add. 40 

16. The kit of claim 15. wherein the modifying reagent is 
bisulfite* 

17. The kit of claim 15, wherein said reagent modifies 
cytosine to uraciL 

18. The kit of claim 15, wherein the primer hybridizes 45 
with a target polynucleotide sequence having the sequence 
selected from the group consisting of SEQ ID NO: 1-23 and 
SEQ ID NO:24. 

19. The kit of claim 15^ wherein the primers are selected 
from the group consisting of SEQ ID NO:25-^7 and 48. 



20. Isolated oligonucleotide iKTmer(s) for detection of a 
m^ylated CpG-containing nucleic add wherein the primer 
hybridizes with a target polynucleotide sequence having the 
sequence selected from the group consisting of SEQ ID 
NO: 1-23 and SEQ ID NO:24. 

21. The primers of claim 20, wherein the primer pairs are 
selected from the group consisting of SEQ ID NO:25-47 
and 48. 

22. A kit for the detection of methylated CpG-containing 
nucleic acid from a sample comprising: 

a) a reagent that modifies unmethylated cytosine nucle- 
otides; 

b) a wild-type unmodified control nucleic acid; 

c) primers for the amplification of unmethylated CpG- 
containing nucleic add; 

d) primers for the amplification of methylated CpG- 
containing nucleic add; and 

e) primers for the amplification of control unmodified 
nucleic acid. 

wherein the primers for the amplification of unmethylated 
CpG-containing nucleic acid and methylated CpG- 
containing nucleic acid distinguish between modified 
methylated and nonmethylated nucleic acid. 

23. The kit of claim 22, further comprising nuddc add 
amplification buffer. 

24. The kit of daim 22, wherein the reagent that modifies 
unmethylated cytosine is bisulfite. 

25. The kit of claim 22, wherein primers hybridize with a 
target polynucleotide sequence having the sequence selected 
from the group consisting of SEQ ID NO: 1-23 and SEQ ID 
NO:24. 

26. The kit of claim 22, wherein the primers are selected 
from the group consisting of SEQ ID NO:25-47 and SEQ ID 
NO:48. 

27. A method for detecting a methylated CpG-containing 
nucleic add comprising: 

contacting a nucleic acid-containing specimen with 
bisulfite to modify unmethylated cytosine, 

amplifying the CpG-containing nucleic add in the speci- 
men by means of CpG-specific oligonucleotide 
primers, wherein the oligonucleotide primers distin- 
guish between modified methylated and nonmethylated 
nuddc add and 

detecting the methylated nucleic add based on the pres- 
ence or absence of an^>lification products produced in 
said amplifying step. 
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1 atggcggcagggt.ccact:acgct;gcgc9cagt^g9ga.agct^cag 

MAAGSTTLRAVGKLQ 
4 6 gt:gcgt:ct:ggccact.&agacggragccgaaaaagct;agagaaat;at: 

VRLATKTEPKKLEKY 
9 1 tt.gca.gaaactic^ccgcctt.gcccatgaccgcaga.cat.cct^gcg 

L, QKLSALPMTADI LA 
13 6 g-ags.ct.g-gaatcaga.aagacggt^gaagcgcctgcggaagcacca.g 

ETGI RKTVKRLRKHQ 
181 cacgtgggcgactttgccagagacttagcggcccggtggaagaag 

HVGD FARD LAARW KK 
22 6 ctggtgctcgtggaccgaaacaccgggccrtgacccgcaggaccct: 

I-VLVDRNTGPDPQDP 
271 gaggagagcgcttcccgacagcgcttcggggaggctcttcaggag 

EESASRQRFGEALQE 
316 cgggaaaaggcctggggcttcccagaaaacgcgacggcccccagg 

REKAWG FPENATAPR 
361 agcccatctcacagccctgagcacagacggacagcacgcagaaca 

S PSHSPEHRRTARRT 
4 06 cctccggggcaacagagacctcacccgaggtctcccagtcgcgag 

PPGQORPHPRSPSRE 
4 51 ^cccagagccgagagaaagcgccccagaatggccccagctgat tec 

P RA^ RK~R P RMA PA a-S"- 
49C> ^ ^ -cccca tcgggaccctccaacgcgcaccgct cccctcccgatg 

GPHRDPPTRTAPLPM 
54 1 cccgagggccctgagcccgctgtgcccggggagcaacccggaaga 

PEGPEPAVPGEQPGR 
586 ggccacgctcacgccgctcagggcgggcctccgctgggtcaaggc 

GHA HAAQGGPL'LGQG 
631 cgccagggccaaccccagggggaagcggtggggagccacagcaag 

C QG Q P'QGE'aVGS H S K 
676 gggcaca aa teg t cccg egg ggctt egg ctcagaaatcgcct eet 

GHKSSRGASAQKSPP 
721 gtLCcaggaaagccag tcagagaggctgcaggcggccggcgct gat 

VQESOSERLQAAGAD 
7 66 tccgccgggccgaaaacgg t gcccagccat gtctt ct cggagctc 

SAGPKTVPSHVFSEL 
811 tgggacccctcagaggcctggatgcaggccaactacgatctgctg 

WDPSEAWMQANYDLL 

856 tccgcttttgaggccatgacctcccaggcaaacccagaagcactc 

SAFEAMTSQANPEAL - — 

901 tccgcgccagcgctccaggaggaagctgct ttccctggacgcaga I i 

SAPALQEEAAFPGRR 
94 6 gtgaacgctaagatgccggt gtactcgggcuccaggcctgcctgc 

VNAKMPVYSGSRPAC 
9 91 cagctccaggtgccgacgct gcgccegcag r^gcct ccgggtgcct 

QLQVPTLROOCLRVP 
1036 aggaacaa tccgga cgccc t egg eg a eg tggasqgggt cccctac 

RNN PDALGDVEGVpy 
108 1 tcggt tcttgaacccgt tct ggaagggt gg acgcccga tcagctg 

SVLEPVLEGWTPDOL 
112 6 ta cc gca ca g ag a aaga caa tgccgcactcgct eg agagaca gat 

YRTEKDNAALARETD 
1171 gaatcacggaggacccattgcctccaggacctcaaggaagaaaag 

ELWRIHCLQDFKEEK 
121 6 ccacaggagcacgagtcttggcgggagctgtacctgcggcttcgg 

PQEHESWRELYLRLR 
1261 gacgcccgagagcagcggctgcgagtiagtgaccacgaaaatccga 

DAREQRLRVVTTKI R 
1306 tcegeacgtgaaaacaaacecagcggccgacegacaaagatgatc 

SARENKPSGRQTKMI 
1351 tgtctcaactctgtggccaagacgccttatgstgcttccaggagg 

C FNSVAKTPYDASRR 
1396 ca aga gaagt ctgcaggagccgctgaccccggaaa tggagag a tg 

QEKSAGAADPGNGEM 
14 4 1 Gagccagcccccaagcccgcaggaagcagccaggccccccccggc 

EPAPKPAGSSQAPSG 
14 8 6 ct cggggacggcgacggcggcagcg tgagcgccggcggcagcagc 

LGDGDGGSVSGGGSS 
1531 aa ccggcacgcggcgcccgcggacaaaacccgaaa acaggct gcc 

NRHAAPADKTRKOAA 
157 6 aagaaagtggccccgctgatggccaaggcaattcgagactacaag 

KKVAPLMAKA IRDY K 
1621 ggaagat tctcccgacgat aa 1884 

G R P S R R * 
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METHODS FOR ANALYZING METHYLATED CPG 
ISLANDS AND GC RICH REGIONS 

RELATED APPLICATION DATA 

[0001] This application claims the benefit of U.S. Provi- 
sional Application Serial No. 60/338,888 filed Nov. 30, 
2001, the entire contents of which is incorporated herein by 
reference. 

STATEMENT OF GOVERNMENT SUPPORT 

[0002] This invention was made in part with government 
support under Grant No. CA65145 awarded by the National 
Institutes of Health. The government may have certain rights 
in this invention. 

BACKGROUND OF THE INVENTION 
[0003] 1. Field of the Invention 

[0004] The present invention relates generally to methy- 
lation of genomic DNA and more specifically to the iden- 
tification of sequences normally methylated in the genome 
and their relationship to disease states. 

[0005] 2. Background Information 

[0006] DNA methylation is central to many mammalian 
processes including embryonic development, X-inactiva- 
tion, genomic imprinting, regulation of gene expression, and 
host defense against parasitic sequences, as well as abnormal 
processes such as carcinogenesis, fragile site expression, and 
cytosine to thymine transition mutations. DNA methylation 
in mammals is achieved by the transfer of a methyl group 
from S-adenosyl-methionine to the C5 position of cytosine. 
This reaction is catalyzed by DNA methyltransf erases and is 
specific to cytosines in CpG dinucleo tides. Seventy percent 
of all cytosines in CpG dinucleotides in the human genome 
are methylated and prone to deamination, resulting in a 
cytosine to thymine transition. This process leads to an 
overall reduction in the frequency of guanine and cytosine to 
about 40% of all nucleotides and a further reduction in the 
frequency of CpG dinucleotides to about a quarter of their 
expected frequency (Bird 1986). 

[0007] The exception to CpG under representation in the 
genome is CpG islands, which were first identified as Hp a II 
tiny fragments (Bird et al. 1985), and were later formally 
defined as sequences >200 bp in length, with a GC content 
>0.5, and a CpGobs/CpGexp (observed to expected ratio 
based on GC content) >0.6 (Gardiner-Garden and Frommer 
1987). CpG islands have been estimated to constitute 
l%-2% of the mammalian genome (Antequera and Bird 
1993), and are found in the promoters of all housekeeping 
genes, as well as in a less conserved position in 40% of 
genes showing tissue -specific expression (Larsen et al. 
1992). The persistence of CpG dinucleotides in CpG islands 
is largely attributed to a general lack of methylation of CpG 
islands, regardless of expression status (reviewed in Cross 
and Bird 1995). 

[0008] Although CpG islands are believed to be unmethy- 
lated, two exceptions to this rule in normal cells are the 
inactive X chromosome (Yen et al. 1984) and imprinted 
genes (Ferguson-Smith et al. 1993 ; Razin and Cedar 1994 
; Barlow 1995), both of which are associated with methy- 
lated CpG islands. Genomic imprinting is the parental 



origin-specific differential expression of the two alleles of a 
gene, and most imprinted genes show differential germline 
methylation of associated CpG islands (reviewed in Ohlsson 
et al. 2001). A third exception to the rule of methylation 
exclusion of CpG islands is aberrant methylation of CpG 
islands in tumors and in immortalized cultured cells, and 
such CpG island methylation is thought to contribute to 
carcinogenesis (Herman et al. 1994; Merlo et al. 1995). 

[0009] Because of the interest in DNA methylation, 
genomic imprinting, and cancer, several general approaches 
have been used to identify CpG islands that are differentially 
methylated in specific cell types, such as screening tumor- 
normal pairs for cancer-related methylation changes (Huang 
et al. 1999; Shiraishi et al. 1999; Toyota et al. 1999 ), or 
pronuclear transplantation to examine differential parental 
origin for imprinted genes (Hayashizaki et. 1994 ; Plass et 
al. 1996). However, there are no reports of successfully 
using a systemic effort to identify unique, methylated CpG 
islands. 

[0010] There are a variety of genome scanning methods 
that have been used to identify altered methylation sites in 
cancer cells. For example, one method involves restriction 
landmark genomic scanning (Kawai et al., Mol. Cell. Biol. 
14:7421-7427, 1994), and another example involves methy- 
lation-sensitive arbitrarily primed PCR (Gonzalgo et al.. 
Cancer Res. 57:594-599, 1997). Changes in methylation 
patterns at specific CpG sites have been monitored by 
digestion of genomic DNA with methylation-sensitive 
restriction enzymes followed by Southern analysis of the 
regions of interest. The digestion-Southern method is a 
straightforward method but it has inherent disadvantages in 
that it requires a large amount of DNA (at least or greater 
than 5 ug) and has a limited scope for analysis of CpG sites 
(as determined by the presence of recognition sites for 
methylation-sensitive restriction enzymes). Another method 
for analyzing changes in methylation patterns involves a 
PCR-based process that involves digestion of genomic DNA 
with methylation-sensitive restriction enzymes prior to PCR 
amplification (Singer-Sam et al., Nucl. Acids Res. 18:687, 
1990). However, this method has not been shown effective 
because of a high degree of false positive signals (methy- 
lation present) due to inefficient enzyme digestion of over- 
amplification in a subsequent PCR reaction. 

[0011] Genomic sequencing has been simplified for analy- 
sis of DNA methylation patterns and 5-methylcytosine dis- 
tribution by using bisulfite treatment (Frommer et al., Proc. 

Natl. Acad. Sci. USA 89:1827-1831, 1992). Bisulfite treat- 
ment of DNA distinguishes methylated from unmethylated 
cytosines, but original bisulfite genomic sequencing requires 
large-scale sequencing of multiple plasmid clones to deter- 
mine overall methylation patterns, which prevents this tech- 
nique from being commercially useful for determining 
methylation patterns in any type of a routine diagnostic 
assay. 

[0012] In addition, other techniques have been reported 
which utilize bisulfite treatment of DNA as a starting point 
for methylation analysis. These include methylation-specific 
PCR (MSP) (Herman et al. Proc. Natl. Acad. Sci. USA 
93:9821-9826, 1992); and restriction enzyme digestion of 
PCR products amplified from bisulfite-converted DNA 
(Sadri and Hornsby, Nucl. Acids Res. 24:5058-5059, 1996; 
and Xiong and Laird, Nucl. Acids. Res. 25:2532-2534, 
1997). 
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[0013] PCR techniques have been developed for detection 
of gene mutations (Kuppuswamy et al., Proc. Natl. Acad. 
Sci. USA 88:1143-1147, 1991) and quantitation of allelic- 
specific expression (Szabo and Mann, Genes Dev. 9:3097- 
3108, 1995; an d Singer-Sam et al., PCR Methods Appl. 
1:160-163, 1992). Such techniques use internal primers, 
which anneal to a PCR-generated template and terminate 
immediately 5' of the single nucleotide to be assayed. 
However an allelic -specific expression technique has not 
been tried within the context of assaying for DNA methy- 
lation patterns. 

[0014] Therefore, there remains a need for a method for 
using a systemic or genome-wide approach to identify 

unique, methylated CpG islands, GC rich regions and CpG 
dinucleotides, including normally methylated CpG 
sequences. 

SUMMARY OF THE INVENTION 

[0015] The present invention is based on the seminal 
discovery that normally methylated CpG islands or GC rich 
regions in the genome may lose methylation and this loss of 
methylation may be used to identify various diseases or 
disease states in a subject, imprinted genes and other char- 
acteristics of the genome. 

[0016] In another aspect the present invention provides a 
method for identifying a CpG island or GC rich-regulated 
gene. It should be understood that while many of the 
illustrative examples in this invention show CpG islands, the 
invention includes not only CpG islands, but also GC rich 
regions and even CpG dinucleotide sequences. Thus, 
although the term island may be referred to, the term 
includes other GC rich sequences as well. The method 
includes identifying a candidate gene located on a chromo- 
some near a CpG island and determining whether the 
expression of the candidate gene is regulated by methylation 
of the CpG island or GC rich region. In one illustrative 
example, the CpG island or GC rich regions used in the 
method include at least one of SEQ ID NO: 3-31. In certain 
embodiments, the method includes identifying the methyla- 
tion state of a CpG island or GC rich region other than SEQ 
ID NO: 8 (gDMR 3-4), which has been identified as a gDMR 
(Arima et al. 2000). 

[0017] In another aspect the present invention provides a 
method for identifying a population of CpG islands or GC 
rich regions in a genome. This aspect of the invention 
utilizes a method for isolating a library of normally methy- 
lated CpG island or GC rich regions disclosed herein. A 
method according to this aspect of the invention provides a 
genome -wide scan to identify a population of CpG islands or 
GC rich regions based on the combination of restriction 
enzymes used for the method. Therefore, a method accord- 
ing to this aspect of the invention identifies multi-copy CpG 
islands or GC rich regions within repeats as well as single 
copy CpG islands or GC rich regions. The method includes 
performing a double digestion by cleaving genomic DNA 
with both a restriction enzyme that cleaves at a recognition 
site with an AT content of greater than 50%, preferably 
greater than 75% AT, most preferably 100% AT, and a 
restriction endonuclease that cleaves at an unmethylated 
restriction site comprising greater than 50% CG, preferably 
greater than 75% GC, most preferably 100% GC, to generate 
a series of restriction fragments. The series of restriction 



fragments in length are typically size fractionated as dis- 
cussed below, and fragments of a specified length (e.g. 
greater than 500 base pairs) are cloned in a restriction 
negative bacteria to generate a first library. This first cloning 
step enriches for CpG islands or GC rich regions and 
eliminates unmethylated CpG islands or GC rich regions 
because of the methylcytosine sensitivity of the restriction 
enzyme that recognizes only unmethylated restriction sites. 

[0018] In another aspect, the present invention provides an 
isolated polynucleotide that includes a nucleotide sequence 
unmethylated in nucleic acid of paternal origin and methy- 
lated in nucleic acid of maternal origin. The polynucleotiode 
is about 1638 nucleotides encoding about 546 amino acids 
and has about 79% amino acid sequence identity to Elongin 
A2. In embodiment, the polynucleotide is set forth in SEQ 
ID NO: 1. This polynucleotide appears to be polymorphic at 
position 910, which can be G or A. In another embodiment, 
the polynucleotide encodes a polypeptide as set forth in SEQ 
ID NO: 2. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0019] FIG. 1 illustrates an overall strategy for cloning 
methylated CpG islands. In step 1, genomic DNA is digested 
with Mse I which cuts between CpG islands, and Hp a II, 
which cuts unmethylated CpG islands. Mse I fragments 
containing methylated CpG islands then are transformed 
into a bacterial strain that does not cut methylated DNA. 
However, brief bacterial passage leads to loss of methylation 
of these previously methylated sequences. In step 2, the 
library DNA is pooled and digested with Eag I, which cuts 
relatively large fragments within CpG islands, and these 
fragments are then subcloned. 

[0020] FIGS. 2A and 2B illustrate methylation of CpG 
islands in normal human DNA. Genomic DNA from periph- 
eral blood lymphocytes (A) or tissues (B) was digested with 
Mse I (M), Mse I+Hpa II (MH), or Mse I+Msp I (MM). 
Fragment sizes are indicated to the right. CpG islands used 
for Southern blot hybridization are indicated in panel A, and 
CpG island clone 1-19 was used in panel B. Note that there 
is an Mse I polymorphism in the fetal tissue that is not in the 
adult tissue, accounting for the presence of two bands in the 
fetal tissue Mse I digest. Blots were made in duplicate and 
one set was hybridized to RB to ensure the presence of DNA 
in the Msp I lane. BR, brain; CO, colon; KI, kidney; LI, 
liver; fCNS, fetal CNS; fKI, fetal kidney; ELU, fetal lung; 
fSK, fetal skin. 

[0021] FIGS. 3A and 3B show a series of gels showing 
differential methylation of novel gDMRs in uniparental 
tissues of germline origin. Fragment sizes (kb) are indicated 
to the right. (A) Sperm (SP), ovarian teratoma (OT), or 
complete hydatidiform mole (CHM) was digested, and 
Southern blot hybridization was performed with the gDMRs 
indicated, as described in the legend to FIG. 2. Multiple OT 
and CHM were examined with similar results, although only 
one is shown. (B) Similar experiments were performed with 
an unmethylated CpG island in the retinoblastoma gene 
(RB), with a CpG island upstream of HI 9 that shows 
preferential methylation of the paternal allele, and with a 
CpG island within the SNRPN gene that shows preferential 
methylation of the maternal allele. 

[0022] FIG. 4 shows a series of gels showing similar 
methylation of novel SMRs in uniparental tissues of germ- 
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line origin. Experiments were performed as described in the 
legend to FIG. 2, using the SMRs indicated. Fragment sizes 

are indicated to the right. 

[0023] FIG. 5 illustrates the chromosomal location and 
relationship of representative methylated CpG islands to 
nearby genes. Genes are indicated with boxes, and the 
arrows show transcriptional orientation. The methylated 
CpG islands are shown in shading. In the case of 2-78, the 
homologous sequence within Elongin A2 is indicated. 

[0024] FIG. 6 shows the nucleotide and amino acid 
sequence of Elongin A3 (SEQ ID NO: 1 and 2, respectively). 
The transcription factor SII similarity motif is shown by the 
boldfaced bases in the top 6 lines of the figure. The nuclear 
localization signal is shown by the boldfaced bases in the 
bottom 2 lines of the figure. The site of the (G/A) polymor- 
phism used for imprinting analysis is boldfaced at nucleotide 
910, and the PGR primers specific for Elongin A3 are shown 
in boldfaced type beginning on the lines that have number 
811 and 1261 to the left. 

[0025] FIGS. 7A-D illustrates tissue -specific imprinting of 
Elongin A3. The (G/A) polymorphism was used to assess 
allele-specific expression in four heterozygous fetuses 
denoted A, B, C, and D. Chromatograms of genomic DNA 
(gDNA) sequence are included to show heterozygosity, as 
well as the homozygous maternal decidual DNA indicating 
parental origin. (A) Monoallelic expression of the maternal 
allele in lung, central nervous system (CNS), and limbs, and 
biallelic expression in kidney. (B) Monoallelic expression of 
the maternal allele in placenta and CNS, and biallelic 
expression in intestine. (C) Monoallelic expression of the 
maternal allele in CNS, biallelic expression in kidney and 
liver. (D) Monoallelic expression of the maternal allele in 
placenta, and biallelic expression in kidney and liver. 
Sequencing was done bidirectionally in all cases, and 
monoallelic expression of the maternal allele did not depend 
on whether that allele was A or G. 

[0026] FIG. 8 shows sequence conservation of methylated 
CpG islands between human and mouse. Human methylated 
CpG islands and ~1 kb of flanking DNA were compared to 
mouse sequence, synteny was confirmed, the corresponding 
mouse CpG islands were identified, and regions of conser- 
vation (percentage shown) were determined. In the case of 
gDMR 1-21, the corresponding mouse sequence, while 
GC-rich, showed an observed to expected CpG ratio of 
0.45-0.50 and therefore was not classified as a CpG island. 

DETAILED DESCRIPTION OF THE 

INVENTION 

[0027] To identify chromosomal regions that might harbor 
imprinted genes, the present invention provides a method for 
generating a library of normally methylated GC rich regions 
(e.g., a CpG island). Most of the nucleic acid sequences 
containing methylated CpG islands or GC rich regions 
isolated using the methods of the invention are high copy 
number dispersed repeats. However, unique clones in the 
library can be identified and characterized. Some of the 
unique clones identified herein were differentially methy- 
lated in uniparental tissue of germline origin. These clones 
are referred to herein as germline differentially methylated 
regions (gDMRs). 

[0028] Surprisingly, many of the methylated CpG islands 
or GC rich regions identified in the Examples herein, are 



methylated in germline tissues of both parental origins, 
representing a previously uncharacterized class of normally 
methylated CpG islands or GC rich regions in the genome, 
and which we term similarly methylated regions (SMRs). 
These SMRs, in contrast to the gDMRs, are shown herein to 
be significantly associated with telomeric band locations, 
suggesting a potential role for SMRs in chromosome orga- 
nization. Finally, many of the methylated CpG islands or GC 
rich regions are on average 85% conserved between mouse 
and human. While many CpG or GC rich regions are CpG 
islands, the methods of the invention are not limited to CpG 
islands. 

[0029] In one embodiment, the invention provides a 

method for determining a disease state in a subject by 
determining the DNA methylation status at a cytosine resi- 
due of a CpG sequence in a genomic DNA sample from the 
subject, wherein hypomethylation of a CpG sequence nor- 
mally methylated in a subject not having the disease state, is 
indicative of a disease state in the subject. The CpG 
sequence is typically found within a GC rich region or a 
CpG island. The invention methods are preferably used 
when the subject is a human. Although the disease state is 
often cancer, the invention is not so limited. The disease 
state includes other diseases such as multiple sclerosis, 
Alzheimer's disease, Parkinson's disease, depression and 
other imbalances of mental stability, atherosclerosis, cystic 
fibrosis, diabetes, obesity, Crohn's disease, and altered cir- 
cadian rhythmicity, arthritis, inflammatory reactions or dis- 
orders, psoriasis and other skin diseases, autoimmune dis- 
eases, allergies, hypertension, anxiety disorders, 
schizophrenia and other psychoses, osteoporosis, muscular 
dystrophy, amyotrophic lateral sclerosis or circadian 
rhythm-related conditions. 

[0030] In another embodiment, the invention provides a 
method for determining the DNA methylation status at a 
cytosine residue of a CpG sequence in a genomic DNA 
sample by performing methylation state analysis of one or 
more CpG islands or GC rich regions of a genomic DNA 
sample, thereby determining the DNA methylation status in 
the genomic DNA sample. One method for performing 
methylation state analysis is exemplifed in the Examples 
herein. In one aspect, the one or more CpG islands or GC 
rich regions include differentially methylated regions 
(DMRs). In another aspect, the one or more CpG islands and 
GC rich regions include similarly methylated regions 
(SMRs). 

[0031] In one aspect the present invention provides a 
method for identifying a CpG island or GC rich region 
methylation state that includes performing methylation state 

analysis of one or more CpG islands or GC rich regions of 
a genomic DNA sample, wherein the one or more CpG 
islands or GC rich regions are at least one of SEQ ID NOs: 
1-31, and in certain embodiments, SEQ ID NOs: 3-7 and 
9-31. 

[0032] In one aspect the present invention provides a 
method for identifying a CpG island or GC rich region 
methylation state that includes performing methylation state 
analysis of one or more CpG islands or GC rich regions of 
a genomic DNA sample. In one aspect, the one or more CpG 
islands or GC rich regions are at least one of SEQ ID NOs: 
1-31, and in certain embodiments, SEQ ID NOs: 3-7 and 
9-31 with the proviso that it is not SEQ ID NO: 8. CpG 
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islands or GC rich regions are sequences greater than 200 bp 
in length, with a GC content >0.5, and a CpGobs/CpGexp 
(observed to expected ratio based on GC content) >0.6 
(Gardiner-Garden and Frommer 1987). 

[0033] These methods are useful in providing information 
regarding gene regulation since it is known that methylation 
of CpG islands or GC rich regions affects gene expression 
(Ferguson-Smith et al. 1993; Razin and Cedar 1994; Barlow 
1995, Ohlsson et al. 2001, Herman et al. 1994; and Merlo et 
al. 1995). For example, expression of a tumor suppressor 
gene can be abolished by de novo DNA methylation of a 
normally unmethylated CpG island or GC rich region (Issa, 
et al.. Nature Genet., 7:536, 1994; Herman, et al., supra; 
Merlo, et al.. Nature Med., 1:686, 1995; Herman, et al.. 
Cancer Res., 56:722, 1996; Graff, et al.. Cancer Res., 
55:5195, 1995; Herman, et al.. Cancer Res., 55:4525, 1995). 
Consistent with the role of the CpG islands or GC rich 
regions identified herein in gene regulation, most of the 
methylated CpG islands or GC rich regions disclosed herein 
are localized within or near the coding sequence of known 
genes or of anonjnnous ESTs within the GenBank or Celera 
databases. The GC rich regions may be in exons, introns or 
regulatory regions, for example. 

[0034] In all the methods described herein, the identifica- 
tion of sequences normally methylated and which have lost 
methylation is used for identifying a disease or disease state. 
Such disease or disease state includes cancer, multiple 
sclerosis, Alzheimer's disease, Parkinson's disease, depres- 
sion and other imbalances of mental stability, atherosclero- 
sis, cystic fibrosis, diabetes, obesity, Crohn's disease, and 
altered circadian rhythmicity, arthritis, inflammatory reac- 
tions or disorders, psoriasis and other skin diseases, autoim- 
mune diseases, allergies, hypertension, anxiety disorders, 
schizophrenia and other psychoses, osteoporosis, muscular 
dystrophy, amyotrophic lateral sclerosis and circadian 
rhythm-related conditions. Preferred subjects for the present 
methods are mammals such as humans. 

[0035] The methylation state of CpG refers to whether a 
particular cytidine residue in a CpG containing dinucleotide 
contains any degree of methylation. A CpG dinucleotide or 
a CpG island or GC rich region is characterized as either 
methylated or non-methylated based on whether any 
cytidines of the CpG island or GC rich region are methy- 
lated. The methylation state of a CpG island or GC rich 
region may be completely unmethylated, completely methy- 
lated, or partially methylated, and the degree of methylation 
can be quantified as a percent of residues methylated, as well 
as individually methylated CpG sites identified. In addition, 
a particular site can be variably methylated in a population 
of cells, and that degree of methylation can be quantified. 
Prior to the present invention, it had been thought that CpG 
dinucleotides or islands or GC rich regions were typically 
unmethylated, meaning that the degree of methylation 
would be nearly zero or quite low (such as less than 10%). 
Thus, a degree of ''normal'* methylation greater than the 
nearly zero amount would be refered to as a "normally" 
methylated CpG dinucleotide. 

[0036] Methylation state analysis of CpG islands or GC 
rich regions can be performed by any method known in the 
art. Most of the methods developed to date for detection of 
methylated cytosine depend upon cleavage of the phos- 
phodiester bond alongside cytosine residues, using either 



methylation-sensitive restriction enzymes or reactive chemi- 
cals such as hydrazine which differentiate between cytosine 
and its 5-methyl derivative. Examples of methylation sen- 
sitive restriction endonucleases which can be used to detect 
5 'CpG methylation include Smal, SacII, EagI, Mspl, Hpall, 
BstUI and BssHII, for example. 

[0037] Genomic sequencing protocols which identify a 
5-MeC residue in genomic DNA as a site that is not cleaved 
by any of the Maxim Gilbert sequencing reactions can also 
been used. Other techniques utilize bisulfite treatment of 
DNA as a starting point for methylation analysis. These 
include methylation-specific PGR (MSP) (Herman et al. 
Proc. Natl. Acad. Sci. USA 93:9821-9826, 1992); and 
restriction enzyme digestion of PCR products amplified 
from bisulfite-converted DNA (Sadri and Hornsby, Nucl. 
Acids Res. 24:5058-5059, 1996; and Xiong and Laird, Nucl. 
Acids. Res. 25:2532-2534, 1997). See also 6,262,171 6,200, 
756 6,017,704 5,786,146, all incorporated herein by refer- 
ence. 

[0038] In certain embodiments of this aspect of the present 
invention, CpG island or GC rich region methylation state is 
determined for similarly methylated regions (SMRs). The 
Examples included herein utilize methods of the present 
invention to identify numerous human SMRs. SMRs are 
CpG islands or GC rich regions that are methylated equally 
in male and female tissue of germline origin. The SMRs can 
include at least one of SEQ ID NO: 3, SEQ ID NO: 4, SEQ 
ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 9, 
SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID 
NO: 16, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, 
SEQ ID NO: 26, SEQ ID NO: 29, and SEQ ID NO: 30, as 
identified in Table 1. Thus, in one aspect, the invention 
provides a method for determining the methylation status of 
a population of similarly methylated regions (SMRs) in a 
subject by performing methylation status analysis of a 
population of SMRs of genomic DNA from a human sample. 
In one aspect, the methylation status of SMRs is correlated 
with a disease state. In one aspect, the population of SMRs 
comprises at least two SMRs and in one aspect, at least three 
SMRs. 

[0039] Of the sixteen SMRs identified herein, sixteen of 
seventeen were localized near the ends of chromosomes, 
either on the last (n=15) or the penultimate (n=l) subband of 
the chromosome on which it resides. The method of this 
aspect of the invention can identify the methylation state of 
an SMR located near the end of a chromosome. Table 2 and 
FIG. 5 show the location of specific SMRs of the invention. 
The Examples included herein also identify CpG islands or 
GC rich regions that are differentially methylated in germ- 
line tissue of male and female origin. These CpGs are 
referred to herein as germline differentially methylated 
regions (gDMRs). The method of this aspect of the invention 
can identify the methylation state of a gDMR. For example, 
the methylation state can be determined for are least one of 
SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID 
NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 24, 
SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 28, and SEQ 
ID NO: 31. 

[0040] The methylated CpG islands or GC rich regions 
identified herein were distributed throughout the genome. 
There was a striking localization of SMRs near the ends of 
chromosomes. Sixteen of 17 SMRs were localized near the 
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ends of chromosomes, either on the last (n=15) or the 
penultimate (n=l) subband of the chromosome on which it 
resided (Table 2). In contrast, of 12 gDMRs that could be 
mapped (of the 13 gDMRs studied), only four were localized 
near the ends of chromosomes (Table 2). This difference was 
highly statistically significant (P=0.0008, Fisher's exact 
test). The association of SMRs near the ends of chromo- 
somes is consistent with an observation of densely methy- 
lated GC-rich sequences near telomeres, although that study 
did not describe methylated CpG islands or GC rich regions 
(Brock et al. 1999). In addition, there was a segregation of 
gDMRs and SMRs within compartments of differing 
genomic composition, i.e., isochores, which are regions of 
several hundred kilobases of relatively homogeneous GC 
composition (Bernardi 1995). Approximately 75% of the 
SMRs fell within high isochore regions (G+C 50%), as 
might be expected from the high GC content of methylated 
CpG islands or GC rich regions. Surprisingly, however, all 
of the gDMRs fell within low isochore regions (G+C<50%), 
i.e., of relatively low GC content, despite the high GC 
content of the gDMRs themselves (L. Z. Strichman-Almas- 
hanu and A. R Feinberg). This difference was statistically 
significant (P<0.01, Fisher's exact test). Thus, the gDMRs 
and SMRs may lie within distinct chromosomal and/or 
isochore compartments. These results provide the basis for 
a method to identify epigenetic chromosomal domains. 
Localization of CpG islands or GC rich regions to the 
telo/subtelo regions, for example, can be used for identifying 
imprinted gene domains, disease domains (e.g. pl6), chro- 
matin regulated genes controlled at a distance, such as 
telomerase (TERT) or c-myc by CTCF; and developmen- 
tally programmed regions essential for organ formation, 
such as the brain in Lunyak et al. (Science. Oct. 24, 2002), 
for example. 

[0041] The method of this aspect of the invention for 
identffying the CpG island or GC rich region methylation 
state can involve identifying the methylation state of one, or 
more than one CpG island or GC rich region. For example, 
the methylation state of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 
14, 15, 20, 25 CpG islands or GC rich regions, or in certain 
embodiments all 29 CpG islands or GC rich regions dis- 
closed herein (SEQ ID NOs:3-31 is identified). In fact, 
according to the present invention the methylation state of 
any of the CpG islands or GC rich regions disclosed in Table 
1 can be determined. 

[0042] In a embodiment of this aspect of the present 
invention, the methylation state of SEQ ID NO: 27 is 
identified. gDMRl-13 (SEQ ID NO: 27) is located on 18q23 
within a predicted gene of unknown function, and near the 
SALL3 gene, a candidate gene for 18q deletion syndrome, 
which involves preferential loss of the paternal allele (Kohl- 
hase et al. 1999). 

[0043] In another embodiment of this aspect of the present 
invention, the methylation state of SEQ ID NO: 30 is 
identified. SMRl-2 (SEQ ID NO: 30) is located on 19ql3.4 
within 110 kb of a glioma tumor suppressor candidate gene. 

[0044] In another embodiment of this aspect of the present 
invention, the method includes identifying the methylation 
state of SEQ ID NO: 4 (SMR 3-20). This CpG island or GC 
rich region is located within the HDAC4 gene (See FIG. 5) 
and there are several other predicted genes and antisense 
transcripts near this CpG island or GC rich region. 



[0045] In another embodiment of this aspect of the present 
invention, the method includes identifying the methylation 
state of SEQ ID NO: 26, located within 16 kb from CpG 
island or GC rich region 2-3. 

[0046] In another embodiment of this aspect of the present 
invention, the method includes identifying the methylation 
state of SEQ ID NO: 21. SEQ ID NO: 21 (SMR 3-110) is 
located near a predicted apoptosis inhibitor, a septin-like cell 
division gene, a ras homolog, and a predicted translation 
initiation factor. 

[0047] In another embodiment of this aspect of the present 
invention, the method includes identifying the methylation 
state of SEQ ID NO: 23. SEQ ID NO: 23 (SMR 1-12) is 
located near a predicted apoptosis inhibitor, a septin-like cell 
division gene, a ras homolog, and a predicted translation 
initiation factor. 

[0048] In another embodiment of this aspect of the present 
invention, the method includes identifying the methylation 
state of SEQ ID NO: 21 (SMR 3-110) and SEQ ID NO: 23 
(SMR 1-12). Together these CpGs flank a predicted apop- 
tosis inhibitor, a septin-like cell division gene, a ras 
homolog, and a predicted translation initiation factor. 

[0049] In another aspect, the present invention provides a 
method for determining the methylation state of a series of 
simflarly methylated regions (SMRs) in a subject, the 
method comprising performing methylation state analysis 
on a series of SMRs of genomic DNAfrom a human sample. 
The present disclosure reveals that the presence of normally- 
methylated single-copy CpG islands or GC rich regions are 
more abundant than previously believed. The ability to 
analyze the methylation state of a series of these normally 
methylated CpGs provides valuable information regarding 
the overall methylation state of a genome. This information 
may provide information regarding overall chromatin state 
of a genome since SMRs appear to be located near the ends 
of chromosomes, as illustrated in the Examples herein. 
Furthermore, such information may provide prognostic, 
diagnostic, or disease monitoring tools related to cancer, 
based on previous observations that implicate methylation of 
genomic methylation in cancer. 

[0050] The series of SMRs whose methylation state is 
determined can include at least two, three, four, five, ten, 15, 
or aU of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ 
ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 10, 
SEQ ID NO: 1 1, SEQ ID NO: 12, SEQ ID NO: 16, SEQ ID 
NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 26, 
SEQ ID NO: 29, and SEQ ID NO: 30. 

[0051] As discussed above, methods of the present inven- 
tion for identifying a CpG island or GC rich region methy- 
lation state and methods below for identifying a population 
of CpG islands or GC rich regions, provide valuable infor- 
mation regarding gene regulation since it is known that 
methylation of CpG islands or GC rich regions can affect 
expression of genes located near the CpG island or GC rich 
region. Accordingly, in one aspect, the present invention 
provides a method for identifying the presence of an 
imprinted gene that includes comparing the methylome of 
genomic DNA of maternal origin with the methylome of 
genomic DNA of paternal origin, wherein a difference in 
methylation patterns between the two methylomes is indica- 
tive of the presence of an imprinted gene. A methylome is 
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the methylation pattern of an entire genome (Feinberg 
2001). DNA methylation serves as an additional layer of 
genetic information in the genome (Feinberg 2001). Typi- 
cally, methylation in a genome occurs in CpG islands or GC 
rich regions. A methylome of a subject can be determined 
using methods disclosed herein. 

[0052] The present invention includes an imprinted gene 
identified by the above method. Genomic imprinting is the 
parental origin-specific differential expression of the two 
alleles of a gene. Most imprinted genes show differential 
germline methylation of associated CpG islands or GC rich 
regions (reviewed in Ohlsson et al. 2001). 

[0053] In another embodiment of this aspect of the inven- 
tion, a method is provided for identifying the presence of an 
imprinted gene, that includes identifying a population of 
CpG islands or GC rich regions and identifying a candidate 
gene found within 200 kilobases of a first CpG island or GC 
rich region of the population of CpG islands or GC rich 
regions. A determination is made of whether the candidate 
gene is regulated by methylation of the first CpG rich region 
of the population of CpG islands or GC rich regions and 
preferentially methylated in genomic DNA from paternal or 
maternal origin. Regulation of the candidate gene by methy- 
lation of the first CpG island or GC rich region and paternal 
or maternal preferential methylation is indicative of an 
imprinted gene. The present invention includes imprinted 
genes identified by the above method. 

[0054] In certain embodiments, the first CpG island or GC 
rich region is gDMR 3-4 (SEQ ID NO: 27). Interestingly, 
gDMR 3-4 is located on 18q23, which has been implicated 
in bipolar affective disorder, specifically harboring a predis- 
posing gene transmitted preferentially through the father 
(Stine et al. 1995 ; McMahon et al. 1997). Therefore, the 
localization of this gDMR herein can serve as a guidepost 
for identifying candidate imprinted genes for this important 
disease. 

[0055] In another aspect the present invention provides a 
method for identifying a CpG island or GC rich region- 
regulated gene. The method includes identifying a candidate 
gene located on a chromosome near a CpG island or GC rich 
region and determining whether the expression of the can- 
didate gene is regulated by methylation of the CpG island or 
GC rich region. Preferably, the CpG islands or GC rich 
regions used in the method include at least one of SEQ ID 
NO: 3-31. In certain embodiments, the method includes 
identifying the methylation state of a CpG island or GC rich 
region other than SEQ ID NO: 8 (gDMR 3-4), which has 
been identified as a gDMR (Arima et al. 2000). 

[0056] A CpG island or GC rich region-regulated gene is 
a gene whose expression is regulated by methylation of a 
CpG island or GC rich region. A CpG island or GC rich 
region is located near a candidate gene when it is located 
within about 2000, 1000, 500, 200 or 100 kilobases of the 
gene. In other embodiments of the invention, the gene is 
located within about 50, 25, 10, 5, or 1 kilobase of the gene. 
In other embodiments, the CpG island or GC rich region is 
located within a candidate gene. For example, Prader-Willi 
syndrome CpG island or GC rich region in exon 1 of 
SNRPN controls expression of genes up to 2 megabases 
away, e.g. Buiting et al. Inherited microdeletions in the 
Angelman and Prader-Willi syndromes define an imprinting 
centre on human chromosome 15. Nat Genet. April 1995; 
9(4):395-400. 



[0057] A determination of genes on the same chromosome 
as a CpG island or GC rich region, and the approximate 
distance between a CpG island or GC rich region and a 
candidate gene can be determined by mapping the CpG 
island or GC rich region and candidate gene sequences using 
human genome sequence information on databases such as 
GenBank (available at http://www.ncbi.nlm.nih.gov/) or the 
Celera human gene sequence database. 

[0058] Methods are known in the art for determining 
whether expression of a candidate gene is regulated by 
methylation of a CpG island (see e.g., Ferguson-Smith et al. 
1993; Razin and Cedar 1994; Barlow 1995, Ohlsson et al. 
2001, Herman et al. 1994; and Merlo et al. 1995). For 
example, the effect on gene expression of de novo DNA 
methylation of a normally unmethylated CpG island, can be 
analyzed (Issa, et al.. Nature Genet., 7:536, 1994; Herman, 
et al., supra; Merlo, et al.. Nature Med., 1:686, 1995; 
Herman, et al.. Cancer Res., 56:722, 1996; Graff, et al.. 
Cancer Res., 55:5195, 1995; Herman, et al.. Cancer Res., 
55:4525, 1995). 

[0059] In one embodiment, the method includes determin- 
ing whether the candidate gene is regulated by methylation 
of SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ 
ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 
24, SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 28, and 
SEQ ID NO: 31. This embodiment, includes the CpGs 
identified herein as being gDMRs. 

[0060] In a embodiment of this aspect of the present 
invention, the method includes determining whether the 
candidate gene is regulated by methylation of SEQ ID NO: 
27. gDMRl-13 (SEQ ID NO: 27) is located on 18q23 within 
a predicted gene of unknown function, and near the SALL3 
gene, a candidate gene for 18q deletion syndrome, which 
involves preferential loss of the paternal allele (Kohlhase et 
al. 1999). 

[0061] In another embodiment of this aspect of the present 
invention, the method includes determining whether the 
candidate gene is regulated by methylation of SEQ ID 
NO:30. SMRl-2 (SEQ ID NO: 30) is located on 19ql3.4 
within 110 kb of a glioma tumor suppressor candidate gene. 

[0062] In another embodiment of this aspect of the present 
invention, the method includes determining whether the 
candidate gene is regulated by methylation of SEQ ID NO: 
4 (SMR 3-20). This CpG island or GC rich region is located 
within the HDAC4 gene (See FIG. 5) and there are several 
other predicted genes and antisense transcripts near this CpG 
island or GC rich region. 

[0063] In another embodiment of this aspect of the present 

invention, the method includes determining whether the 
candidate gene is regulated by methylation of SEQ ID NO: 
26, located within 16 kb from CpG island or GC rich region 
2-3. 

[0064] In another embodiment of this aspect of the present 
invention, the method includes determining whether the 
candidate gene is regulated by methylation of SEQ ID 
NO:21. SEQ ID NO: 21 (SMR 3-110) is located near a 
predicted apoptosis inhibitor, a septin-like cell division 
gene, a ras homolog, and a predicted translation initiation 
factor. 

[0065] In another embodiment of this aspect of the present 
invention, the method includes determining whether the 
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candidate gene is regulated by methylation of SEQ ID NO: 
23. SEQ ID NO: 23 (SMR 1-12) is located near a predicted 
apoptosis inhibitor, a septin-like cell division gene, a ras 
homolog, and a predicted translation initiation factor. 

[0066] In another embodiment of this aspect of the present 
invention, the method includes determining whether the 
candidate gene is regulated by methylation of SEQ ID NO: 
21 (SMR 3-110) and SEQ ID NO: 23 (SMR 1-12). Together 
these CpGs flank a predicted apoptosis inhibitor, a septin- 
like ceU division gene, a ras homolog, and a predicted 
translation initiation factor. 

[0067] In another embodiment of this aspect of the present 
invention, the method includes determining whether the 
candidate gene is regulated by methylation of a similarly 
methylated region (SMR). For example, the method can 
determine whether the candidate gene is regulated by methy- 
lation of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, 
SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 
10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 16, SEQ 
ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 
26, SEQ ID NO: 29, or SEQ ID NO: 30. 

[0068] In another aspect the present invention provides a 
method for identifying a population of CpG islands or GC 
rich regions in a genome. In one illustrative aspect, the 
invention utilizes a method for isolating a library of nor- 
mally methylated CpG islands or GC rich regions disclosed 
herein. A method according to this aspect of the invention 
provides a genome-wide scan to identify a population of 
CpG islands or GC rich regions based on the combination of 
restriction enzymes used for the method. Therefore, a 
method according to this aspect of the invention identifies 
multi-copy CpG islands or GC rich regions within repeats as 
well as single copy CpG islands or GC rich regions. The 
method includes performing a double digestion by cleaving 
genomic DNA with both a restriction enzyme that cleaves at 
a recognition site with an AT content of greater than 50%, 
preferably greater than 75% AT, most preferably 100% AT, 
and a restriction endonuclease that cleaves at an unmethy- 
lated restriction site comprising greater than 50% CG, 
preferably greater than 75% GC, most preferably 100% GC, 
to generate a series of restriction fragments. The series of 
restriction fragments in length are typically size fractionated 
as discussed below, and fragments of a specified length (e.g. 
greater than 500 base pairs) are cloned in a restriction 
negative bacteria to generate a first library. This first cloning 
step enriches for CpG islands or GC rich regions and 
eliminates unmethylated CpG islands or GC rich regions 
because of the methylcytosine sensitivity of the restriction 
enzyme that recognizes only unmethylated restriction sites. 

[0069] The next step (i.e. the second cloning step) pro- 
vides further enrichment of CpG islands or GC rich regions 
by digesting DNA from the first library with an infrequently 

cutting restriction endonuclease specific for sequences com- 
mon to GC islands or GC rich regions (e.g., a CpG rich 
region infrequent restriction endonuclease). An infrequently 
cutting restriction endonuclease is an endonuclease that 
recognizes a GC-rich recognition site (e.g., greater than 50% 
GC content) of at least 6 base pairs in length, (see Gardiner- 
Garden and Frommer, 1987). As used herein, the methods of 
the invention include not only CpG islands or GC rich 
regions (e.g., greater than 200 bp in length and a GC content 
of >0.5) or GC-islands or GC rich regions in which CpGobs/ 



CpGexp >0.6, but also CpG islands or GC rich regions that 
do not meet these threshhold requirements but which are GC 
rich and contain multiple CpG dinucleotides. (see also 
Strichman-Almashanu et al., 2002, herein incorporated by 
reference in its entirety). Preferably the recognition site 
recognized by the infrequently cutting restriction endonu- 
clease is has a GC content of at least 75%, most preferably 
100% GC. This second cloning step results in isolation of 
relatively large fragments of CpG islands or GC rich regions 
that are normally methylated (i.e., survived the first cloning 
step), but are now unmethylated in the library and therefore 
amenable to digestion and subcloning. 

[0070] Virtually any endonuclease that cleaves at a restric- 
tion site with a GC content of at least 50%, preferably 75%, 
and most preferably 100% can be used for the first cloning 

step in combination with virtually any endonuclease that 
cleaves at a restriction site with an AT content of at least 
50%, preferably 75%, and most preferably 100%. For 
example, the GC-rich recognition site cleaving enzyme 
include but are not limited to Hp all, Btgl, Sad I, NgoM IV, 
Bssh II, Nael, Eag I, BsiE I, Kas I, PspOM I, Narl, Sfol, or 
Apa I. The AT-rich recognition site cleaving enzyme can be, 
for example, Msel, Sspl, Dral, Tsp509I, Apol, Sspl, Asel, 
Psil, Dral. In one embodiment, a double digestion is per- 
formed with Mse I, which recognizes the sequence TAAA 
and Hp a II, which recognizes the sequence CCGG at unm- 
ethylated sites. 

[0071] The CG-rich recognition site cleaving enzyme and 
the AT-rich recognition site cleaving enzyme can be used in 
any order or simultaneously depending on the required 
reaction conditions for the restriction enzymes used, as will 
be understood. 

[0072] A restriction negative bacteria is used in the first 
cloning step in order to avoid bacterial digestion of methy- 
lated genomic DNA. Virtually any restriction negative bac- 
teria can be used in the methods of the present invention. For 
example the restriction negative bacterium can be XL2-Blue 
MRF'. Many strains of bacteria have been derived that are 
deficient in, for example, the bacterial enzymes mcrA, 
mcrCB and mrr. Another example is Strategene XLIO Gold. 
One bacterium for the first cloning step is the restriction- 
negative strain XL2-Blue MRF' to avoid bacterial digestion 
of methylated genomic DNA. 

[0073] For the restriction digest before the second cloning 
step, virtually any endonuclease that recognizes a GC-rich 
recognition site at least 6 base pairs in length can be used. 
Examples of restriction endonucleases that can be used in 
this step include Eag 1. In certain embodiments, the restric- 
tion endonuclease Eag 1 (recognition sequences CGGCCG) 
is used. In these embodiments using Eag 1, the resulting 
library can be referred to as the Eag library. 

[0074] Preferably, after the digestions before the first and 

second cloning steps, DNA fragments of specified lengths 
are isolated and cloned. For example, fragments of at least 
100 bp, 250 bp, 500 bp, 2500 bp, and in certain embodi- 
ments at least 1000 bp are isolated and cloned. In other 
embodiments, DNA fragments of specified size ranges can 
be isolated and cloned. For example, fragments of 100-500 
bp, 500-1000 bp, and greater than 1000 bp can be isolated 
and cloned separately. Methods are well known in the art for 
size fractionating nucleic acids, such as by using gel puri- 
fication. 
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[0075] By repeating the aforementioned method for iden- 
tifying a population of CpG islands or GC rich regions in a 
genome with different restriction enzymes, and by utilizing 
various known methods such as methylation specific PCR or 
bisulfite methods, described herein and known in the art, 
virtually the entire methylome of an organism can be deter- 
mined. Alternatively, the method for identifying a popula- 
tion of low copy number CpG islands or GC rich regions can 
be repeated with different restriction enzymes to identify 
virtually all the low copy number CpG islands or GC rich 
regions of a methylome. 

[0076] The population of CpG islands or GC rich regions 
in methods of this aspect of the invention can include a 
subset of least about 50, 100, 200, 250, 500, or 1000 
palindromic CpG sites. Additionally, the population of CpG 
islands or GC rich regions can include at least about 2, 3, 4, 
5, 10, 20, 25, 50, or 100 distinct CpG islands or GC rich 
regions. 

[0077] The ability to characterize entire methylomes pro- 
vides further uses for the methods of the invention. For 
example, methylomes of the same species, for example 
human methylomes, or portions of a methylome identified 
using a first set of restriction enzymes to perform the above 
method of the invention, can be compared to identify 
methylation differences that are involved in phenotypic 
differences among individuals of a species. Furthermore, 
methylomes, or portions thereof, between species can be 
compared to identify CpG islands or GC rich regions that are 
important gene expression regulators, by identifying CpG 
islands or GC rich regions that are conserved between 
species. The Examples included herein provide a compari- 
son of portions of the methylome of mouse and man to 
identify conserved CpG islands or GC rich regions. 

[0078] Furthermore, the method discussed above for iden- 
tifying a population of CpG islands or GC rich regions in a 
genome, can be used to identify the methylation state of a 
series of CpG islands or GC rich regions in various tissues 
and to determine whether methylation of a CpG island or GC 
rich region is preferentially related to cells from one parent, 
or certain tissues, as illustrated in the Examples provided 
herein. As illustrated in the Examples section hereinbelow, 
62 unique CpG island or GC rich region clones were isolated 
and characterized using methods of the present invention, all 
of which were methylated and GC-rich, with a GC content 
>50%. Of these, 43 clones also showed a CpGobs/CpGexp 
>0.6, of which 30 were studied in detail. These unique 
methylated CpG islands or GC rich regions mapped to 23 
chromosomal regions, and 12 were differentially methylated 
regions in uniparental tissues of germline origin, i.e., hyda- 
tidiform moles (paternal origin) and complete ovarian ter- 
atomas (maternal origin), even though many apparently 
were methylated in somatic tissues. At least two gDMRs 
mapped near imprinted genes, HYMAl and a novel 
homolog of Elongin A and Elongin A2, which we term 

Elongin A3 (NM 145653), discussed in further detail 

below. Surprisingly, 18 of the methylated CpG islands or GC 
rich regions were methylated in germline tissues of both 
parental origins, representing a previously uncharacterized 
class of normally methylated CpG islands or GC rich 
regions in the genome, referred to herein as similarly methy- 
lated regions (SMRs). These SMRs, in contrast to the 
gDMRs, were significantly associated with telomeric band 
locations (P=0.0008), suggesting a potential role for SMRs 



in chromosome organization. At least 10 of the methylated 
CpG islands or GC rich regions identified herein were on 
average 85% conserved between mouse and human. These 
sequences will provide a valuable resource in the search for 
novel imprinted genes, for defining the molecular substrates 
of the normal methylome, and for identifying novel targets 
for mammalian chromatin formation. 

[0079] Evidence for loss of methylation in cancer, (Fein- 
berg and Vogelstein, 1983) has been shown by hypomethy- 
lation in genes of some human cancers as compared to their 
normal counterparts (Nature. Jan. 6, 1983;301(5895):89- 
92). More recently, this has been shown in the activation of 
MAGE melanoma antigen (Serrano, A., Garcia, A., Abril, 
E., Garrido, R, and Ruiz-Cabello, F. 1996). Methylated CpG 
points identified within MAGE-1 promoter were shown to 
be involved in gene repression. (Int. J. Cancer 68: 464-470 
and De Smet, C, De Backer, O., Faraoni, I., Lurquin, C, 
Brasseur, F, and Boon, T. 1996). The activation of human 
gene MAGE-1 in tumor cells was correlated with genome- 
wide demethylation (Proc. Natl. Acad. Sci. 93: 7149-7153). 

[0080] In another aspect the present invention provides a 
method for identifying a population of low copy number 
CpG islands or GC rich regions. A method according to this 
aspect of the invention includes cleaving genomic DNA with 
both a restriction enzyme that cleaves at a recognition site 
comprising adenosine and thymidine residues and a restric- 
tion endonuclease that cleaves at an unmethylated restriction 
site comprising cytidine and guanosine residues, to generate 
a series of restriction fragments and excluding those that are 
methylated, and cloning restriction fragments of at least 200, 
300, 400, 500 and the like kb from the series of restriction 
fragments in a restriction negative bacteria to generate a first 
library. This step is similar to the initial double digestion and 
first cloning step discussed above for the aforementioned 
aspect of the invention. (See FIG. 1). 

[0081] After generating the first library, the cloned DNA 
of the first library is cleaved with a restriction enzyme that 
cleaves DNA at a restriction site within a CpG island or GC 
rich region; excluding CpG island or GC rich region frag- 
ments that contain repetitive elements while leaving low 
copy CpG island or GC rich region fragments intact, thereby 
producing a population of low copy number CpG islands 
and GC rich region fragments. Such fragments may be at 
least about 100, 200, 300, 400, 500 and the like kb in size. 
The method may further include cloning the restriction 
fragments containing low copy CpG islands and GC rich 
regions to form a library containing a plurality of low copy 
CpG island or GC rich region DNA. The "excluding" step is 
by optionally cleaving cloned DNA of the first library with 
a restriction enzyme that cleaves DNA at a restriction site 
within a CpG island or GC rich region repeat sequence or 
using a methylated CpG binding column, or other methods 
known to those of skill in the art. A final library containing 
a plurality of clones is also included in the invention. In one 
aspect, the GC rich regions are CpG islands. 

[0082] In another aspect, the present invention provides an 
isolated polynucleotide that includes a nucleotide sequence 
unmethylated in nucleic acid of paternal origin and methy- 
lated in nucleic acid of maternal origin. The polynucleotiode 
is about 1638 nucleotides encoding about 546 amino acids 
and has about 79% amino acid sequence identity to Elongin 
A2. In embodiment, the polynucleotide is set forth in SEQ 
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ID NO: 1. This polynucleotide appears to be polymorphic at 
position 910, which can be G or A. In another embodiment, 
the polynucleotide encodes a polypeptide as set forth in SEQ 
ID NO: 2. 

[0083] The polynucleotide of SEQ ID NO: 1 is a novel 
imprinted gene that was identified using methods of the 
present invention (as illustrated in the Examples below). The 
CpG island or GC rich region gDMR 2-78 was localized to 
18q21 (FIG. 5) and was completely methylated in all 
somatic fetal and adult tissues tested (FIG. 2 ). However, 
this CpG rich region was unmethylated in CHM and sperm 
and methylated in OT (FIG. 3A). A BLAST search showed 
that the CpG island or GC rich region spanned the putative 
promoter region and body of a gene predicted by GEN- 
SCAN (http://genes.mit.edu/GENSCAN), and included 
1638 nucleotides encoding 546 amino acids (FIG. 6). 
BLAST searches of GenBank and Celera databases using 
the predicted sequences revealed that the predicted gene 
showed 43% amino acid identity to human transcription 
elongation factor B (SIII) polypeptide 3 (TCEB3), also 
known as Elongin A. The novel sequence was even more 
closely related to a previously identified homolog of Elongin 
A termed Elongin A2, or TCEB3L, showing 79% amino acid 
sequence identity to human transcription elongation factor 
(SIII) Elongin A2 (TCEB3L). We therefore term this gene 
Elongin A3. An alternative term is TCEB3L2, but for this 
term to apply, the nomenclature committee will need to 
rename TCEB3L (Elongin A2) TCEB3L1. 

[0084] Analysis of allele -specific expression showed 
monoallelic expression in lung, brain, placenta, and spinal 
cord, with preferential expression from the maternal allele 
(FIGS. 7A-D). There was incomplete preferential expression 
from the maternal allele in two of three kidneys (FIGS. 7A, 
C), and absence of imprint-specific gene expression in one 
kidney and in the intestine or liver (FIGS. 7B, C, D). Thus, 
Elongin A3 shows tissue-specific imprinting, at least in 
prenatal development. 

[0085] In another aspect, the present invention provides an 
isolated polynucleotide that includes a nucleotide sequence 
unmethylated in nucleic acid of paternal origin and methy- 
lated in nucleic acid of maternal origin. The polynucleotiode 
is about 1638 nucleotides encoding about 546 amino acids 
and has about 79% amino acid sequence identity to Elongin 
A2. In a embodiment, the polynucleotide is set forth in SEQ 
ID NO: 1. This polynucleotide appears to be polymorphic at 
position 910, which can be G or A. In another embodiment, 
the polynucleotide encodes a polypeptide as set forth in SEQ 
ID NO: 2. 

[0086] The polynucleotide of SEQ ID NO: 1 is a novel 
imprinted gene that was identified using methods of the 
present invention (as illustrated in the Examples below). The 
CpG island or GC rich region gDMR 2-78 was localized to 
18q21 (FIG. 5) and was completely methylated in all 
somatic fetal and adult tissues tested (FIG. 2 ). However, 
this CpG island or GC rich region was unmethylated in 
complete hydatidiform moles (CHM) and sperm and methy- 
lated in ovarian teratomas (OT) (FIG. 3A). A BLAST search 
showed that the CpG island or GC rich region spanned the 
putative promoter region and body of a gene predicted by 
GENSCAN (http://genes.mit.edu/GENSCAN), and 
included 1638 nucleotides encoding 546 amino acids (FIG. 
6). BLAST searches of GenBank and Celera databases using 



the predicted sequences revealed that the predicted gene 
showed 43% amino acid identity to human transcription 
elongation factor B (SIII) polypeptide 3 (TCEB3), also 
known as Elongin A. The novel sequence was even more 
closely related to a previously identified homolog of Elongin 
A termed Elongin A2, or TCEB3L, showing 79% amino acid 
sequence identity to human transcription elongation factor 
(SIII) Elongin A2 (TCEB3L). We therefore refer to this gene 
herein as Elongin A3, alternatively TCEB3L2. 

[0087] The Elongin A3 gene exhibits monoallelic expres- 
sion in lung, brain, placenta, spinal cord, and some kidneys, 
with preferential expression from the maternal allele (FIGS. 
7A-D). The gene also exhibits an absence of imprint-specific 
gene expression in one kidney and in the intestine or liver 
(FIGS. 7B, C, D). Thus, Elongin A3 shows tissue-specific 
imprinting, at least in prenatal development. Based on this 
expression pattern, the Elongin A3 gene is useful for 
example, as a marker for tissue -specific imprinting. 

[0088] It is known from previous studies that the elongin 
(SIII) complex, which includes elonging Al, strongly stimu- 
lates the rate of elongation by RNA polymerase II by 
suppressing transient pausing by polymerase at many sites 
along the DNA. Elongin (SIII) is composed of a transcrip- 
tionally active A subunit and two small regulatory B and C 
subunits, which bind stably to each other to form a binary 
complex that interacts with elongin A and strongly induces 
its transcriptional activity. Elongin Al, B, and, C are highly 
conserved between mammals and yeast (Aso, T. et al., 
Biochem Biophys Res Commun., 241(2):334-40 (1997)). 

[0089] The elongin (SIII) complex is known to be a 
potential target for negative regulation by the von Hippel- 
Lindau (VHL) tumor suppressor protein, which is capable of 
binding stably to the elongin BC complex and preventing it 
from activating elongin A. Additionally, it is known that 
both the elongin A elongation activation domain and the 
VHL tumor suppressor protein interact with the elongin BC 
complex through a conserved elongin BC binding site motif 
that is essential for induction of elongin A activity by elongin 
BC and for tumor suppression by the VHL protein (Aso T., 
et al., EMBO J, 15(20):5557-66 (1996)). Elongin A2 is also 
known to stimulate transcription by RNA Polymerase II. 

[0090] Based on these results with elongin Al and elongin 
A2, elongin A3 polynucleotides and polypeptides of the 
present invention have utility in in vitro transcription reac- 
tions, in stimulating transcription by RNA polymerase. 
Additionally, elongin A3 polynucleotides and polypeptides 
of the invention have utility in identifying additional tumor 
suppressor genes which interact with transcriptional 
machinery since it is known that at least one transcription 
factor interacts with an elongin complex, as discussed 
above. 

[0091] As used herein, the term "isolated,""substantially 

purified" or "substantially pure" means that the molecule 
being referred to, for example, a polyppeptide or a poly- 
nucleotide, is in a form that is relatively free of proteins, 
nucleic acids, lipids, carbohydrates or other materials with 
which it is naturally associated. Generally, a substantially 
pure polypeptide, polynucleotide, or other molecule consti- 
tutes at least twenty percent of a sample, generally consti- 
tutes at least about fifty percent of a sample, usually con- 
stitutes at least about eighty percent of a sample, and 
particularly constitutes about ninety percent or ninety -five 
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percent or more of a sample. A determination that a polypep- 
tide or a polynucleotide of the invention is substantially pure 
can be made using well known methods, for example, by 
performing electrophoresis and identifying the particular 
molecule as a relatively discrete band. A substantially pure 
polynucleotide, for example, can be obtained by cloning the 
polynucleotide, or by chemical or enzymatic synthesis. A 
substantially pure polypeptide can be obtained, for example, 
by using methods of protein purification, such as chromato- 
graphic or electrophoretic methods. 

[0092] In another aspect, the present invention provides an 
isolated polypeptide according to SEQ ID NO:2. 

[0093] In another aspect, the invention provides, an iso- 
lated or purified, polynucleotide that encodes an elongin A3 
polypeptide described herein (SEQ ID NO: 2), e.g., a 
full-length protein or a fragment thereof, e.g., a biologically 
active portion of the elongin A3 protein. Also included is a 
nucleic acid fragment suitable for use as a hybridization 
probe, which can be used, e.g., to identify a nucleic acid 
molecule encoding a polypeptide of the invention, an elon- 
gin A3 mRNA, and fragments suitable for use as primers, 
e.g., PCR primers for the amplification or mutation of 
nucleic acid molecules. In one embodiment, an isolated 
polynucleotide of the invention includes the nucleotide 
sequence shown in SEQ ID NO: 1, or a portion of any of 
these nucleotide sequences. 

[0094] In another embodiment, an isolated polynucleotide 
of the invention includes a nucleic acid molecule which is a 
complement of the nucleotide sequence shown in SEQ ID 
NO: 1, or a portion of any of these nucleotide sequences. In 
other embodiments, the nucleic acid molecule of the inven- 
tion is sufficiently complementary to the nucleotide 
sequence shown in SEQ ID NO: 1, such that it can hybridize 
(e.g., under high stringency conditions) to the nucleotide 
sequence shown in SEQ ID NO: 1 or 3, thereby forming a 
stable duplex. 

[0095] In one embodiment, an isolated polynucleotide of 
the present invention includes a nucleotide sequence which 
is at least about: 60%, 65%, 70%, 75%, 80%, 85%, 90%, 
91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more 
homologous to the entire length of the nucleotide sequence 
shown in SEQ ID NO: 1, or a portion, preferably of the same 
length, of any of these nucleotide sequences. 

[0096] A nucleic acid molecule of the invention can 
include only a portion of the nucleic acid sequence of SEQ 
ID NO: 1. For example, such a nucleic acid molecule can 
include a fragment which can be used as a probe or primer 
or a fragment encoding a portion of an elongin A3 protein, 
e.g., an immunogenic or biologically active portion of an 
elongin A3 protein. The nucleotide sequence determined 
from the cloning of the elongin A3 gene allows for the 
generation of probes and primers designed for use in iden- 
tifying and/or cloning other elongin A3 family members, or 
fragments thereof, as weU as elongin A3 homologues, or 
fragments thereof, from other species. 

[0097] In another embodiment, a nucleic acid encodes an 
polypeptide fragment of elongin A3 described herein. 
Nucleic acid fragments can encode a specific domain or site 
described herein or fragments thereof, particularly frag- 
ments thereof which are at least 100, 150, 200, 210, 220, 
230, 240, 250, 300, 350, 400, 450, 500, 550, or 546 amino 



acids in length. Nucleic acid fragments should not to be 
construed as encompassing those fragments that may have 
been disclosed prior to the invention. 

[0098] A nucleic acid fragment can include a sequence 
corresponding to a domain, region, or functional site 
described herein. A nucleic acid fragment can also include 
one or more domain, region, or functional site described 
herein. Thus, for example, an elongin A3 nucleic acid 
fragment can include a sequence corresponding to a domain 
that binds other elongins, similar to the domain of elongin 
Al which binds elongins B and C. Elongin A3 probes and 
primers are provided. Typically a probe/primer is an isolated 
or purified oligonucleotide. The oligonucleotide typically 
includes a region of nucleotide sequence that hybridizes 
under moderate, or preferably high stringency conditions to 
at least about 7, 12 or 15, preferably about 20 or 25, more 
preferably about 30, 35, 40, 45, 50, 55, 60, 65, or 75 
consecutive nucleotides of a sense or antisense sequence of 
SEQ ID NO: 1, or of a naturally occurring allelic variant or 
mutant of SEQ ID NO: 1. 

[0099] In a embodiment the nucleic acid is a probe which 
is at least 5 or 10, and less than 200, more preferably less 
than 100, or less than 50, base pairs in length. It should be 
identical, or differ by 1, or less than in 5 or 10 bases, from 
a sequence disclosed herein. If alignment is needed for this 
comparison the sequences should be aligned for maximum 
homology. "Looped** out sequences from deletions or inser- 
tions, or mismatches, are considered differences. 

[0100] In another embodiment a set of primers is pro- 
vided, e.g., primers suitable for use in a PCR, which can be 
used to amplify a selected region of a elongin A3 sequence, 
e.g., a domain, region, site or other sequence described 
herein. The primers should be at least 5, 10, or 50 base pairs 
in length and less than 100, or less than 200, base pairs in 
length. The primers should be identical, or differs by one 
base from a sequence disclosed herein or from a naturally 
occurring variant. 

[0101] A nucleic acid fragment can encode an epitope 
bearing region of a polypeptide described herein. A nucleic 
acid fragment encoding a "biologically active portion of a 
elongin A3 polypeptide** can be prepared by isolating a 
portion of the nucleotide sequence of SEQ ID NO: 1, which 
encodes a polypeptide having a elongin A3 biological activ- 
ity (e.g., the biological activities of the elongin A3 protein is 
described herein), expressing the encoded portion of the 
elongin A3 protein (e.g., by recombinant expression in vitro) 
and assessing the activity of the encoded portion of the 
elongin A3 protein. For example, a nucleic acid fragment 
encoding a biologically active portion of elongin A3 
includes a domain that binds with other elongs such as 
elongin B or elongin C. A nucleic acid fragment encoding a 
biologically active portion of a elongin A3 polypeptide, can 
comprise a nucleotide sequence which is greater than 300 or 
more nucleotides in length. 

[0102] In certain embodiments, a nucleic acid of the 
invention includes a nucleotide sequence which is about 
300, 400, 500, 600, 700, 800, 900, 1000, 1200, 1400, 1600, 
1800, 1900, or more nucleotides in length and hybridizes 
under moderate or high stringency conditions to a nucleic 
acid molecule of SEQ ID NO: 1. 

[0103] In embodiments, the fragment includes at least one, 
and preferably at least 5, 10, 15, 25, 50, 75, 100, 200, 300, 
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500, 1000, 1500, or 1620 nucleotides encoding a protein 
including 5, 10, 15, 20, 25, 30, 40, 50, 100, 200, 210, 220, 
230, 240, 250, 300, 350, 400, 450, 500, or 546 consecutive 
amino acids of SEQ ID NO: 2. 

[0104] The invention further encompasses nucleic acid 
molecules that differ from the nucleotide sequence shown in 
SEQ ID NO: 1. Such differences can be due to degeneracy 
of the genetic code, and result in a nucleic acid which 
encodes the same elongin A3 protein as that encoded by the 
nucleotide sequence disclosed herein. In another embodi- 
ment, an isolated polynucleotide of the invention has a 
nucleotide sequence encoding a protein having an amino 
acid sequence which differs, by at least 1, but less than 5, 10, 
20, 50, or 100 amino acid residues that shown in SEQ ID 
NO: 2. If alignment is needed for this comparison the 
sequences should be aligned for maximum homology. 
"Looped" out sequences from deletions or insertions, or 
mismatches, are considered differences. 

[0105] As used herein, the term "selective hybridization** 
or "selectively hybridize" refers to hybridization under 
moderately stringent or highly stringent physiological con- 
ditions, which can distinguish related nucleotide sequences 
from unrelated nucleotide sequences. 

[0106] As known in the art, in nucleic acid hybridization 
reactions, the conditions used to achieve a particular level of 
stringency will vary, depending on the nature of the nucleic 
acids being hybridized. For example, the length, degree of 
complementarity, nucleotide sequence composition (for 
example, relative GC:AT content), and nucleic acid type, 
i.e., whether the oligonucleotide or the target nucleic acid 
sequence is DNA or RNA, can be considered in selecting 
hybridization conditions. An additional consideration is 
whether one of the nucleic acids is immobilized, for 
example, on a filter. Methods for selecting appropriate 
stringency conditions can be determined empirically or 
estimated using various formulas, and are well known in the 
art (see, for example, Sambrook et al., supra, 1989). 

[0107] An example of progressively higher stringency 
conditions is as follows: 2xSSC/0.1% SDS at about room 
temperature (hybridization conditions); 0.2xSSC/0.1% SDS 
at about room temperature (low stringency conditions); 
0.2xSSC/0.1% SDS at about 42° C. (moderate stringency 
conditions); and O.lxSSC at about 68° C. (high stringency 
conditions). Washing can be carried out using only one of 
these conditions, for example, high stringency conditions, or 
each of the conditions can be used, for example, for 10 to 15 
minutes each, in the order listed above, repeating any or all 
of the steps listed. 

[0108] Nucleic acids of the invention can be chosen for 
having codons, which are compatible, or noncompatible, for 
a particular expression system, e.g., the nucleic acid can be 
one in which at least one codon, or at least 10%, or 20% of 
the codons has been altered such that the sequence is 
optimized for expression in E. coli, yeast, human, insect, or 
CHO cells, for example. 

[0109] Nucleic acid variants can be naturally occurring, 
such as allelic variants (same locus), homologs (different 
locus), and orthologs (different organism) or can be non 
naturally occurring. Non-naturally occurring variants can be 
made by mutagenesis techniques, including those applied to 
polynucleotides, cells, or organisms. The variants can con- 



tain nucleotide substitutions, deletions, inversions and inser- 
tions. Variation can occur in either or both the coding and 
non-coding regions. The variations can produce both con- 
servative and non-conservative amino acid substitutions (as 

compared in the encoded product). 

[0110] Orthologs, homologs, and allelic variants can be 
identified using methods known in the art. These variants 
comprise a nucleotide sequence encoding a polypeptide that 
is 50%, at least about 55%, typically at least about 70-75%, 
more typically at least about 80-85%, and most typically at 
least about 90-95% or more identical to the nucleotide 
sequence shown in SEQ ID NO: 2 or a fragment of this 
sequence. Such nucleic acid molecules can readily be iden- 
tified as being able to hybridize under moderate, or prefer- 
ably high stringency condition, to the nucleotide sequence 
shown in SEQ ID NO: 2 or a fragment of the sequence. 
Nucleic acid molecules corresponding to orthologs, 
homologs, and allelic variants of the elongin A3 cDNAs of 
the invention can further be isolated by mapping to the same 
chromosome or locus as the elongin A3 gene. 

[0111] Allelic variants of elongin A3, e.g., human elongin 
A3, include both functional and non-functional proteins. 
Fimctional allelic variants are naturally occurring amino 
acid sequence variants of the elongin A3 protein within a 
population that maintain the ability to increase the speed of 
transcription by RNA polymerase. Functional allelic vari- 
ants will typically contain only conservative substitution of 
one or more amino acids of SEQ ID NO: 2, or substitution, 
deletion or insertion of non-critical residues in non-critical 
regions of the protein. Non-functional allelic variants are 
naturally-occurring amino acid sequence variants of the 
elongin A3, e.g., human elongin A3, protein within a popu- 
lation that do not have the ability to participate in redox 
reactions or molecular chaperone interactions. Non-func- 
tional allelic variants will typically contain a non-conserva- 
tive substitution, a deletion, or insertion, or premature trun- 
cation of the amino acid sequence of SEQ ID NO: 2, or a 
substitution, insertion, or deletion in critical residues or 
critical regions of the protein. As disclosed hererin, a poly- 
morphism identified for elongin A3 includes a G or Aresidue 
at position 910. 

[0112] In another aspect, the invention features, an iso- 
lated polynucleotide which is antisense to elongin A3. An 
"antisense" nucleic acid can include a nucleotide sequence 
which is complementary to a "sense" nucleic acid encoding 
a protein, e.g., complementary to the coding strand of a 
double -stranded cDNA molecule or complementary to an 
mRNA sequence. The antisense nucleic acid can be comple- 
mentary to an entire elongin A3 coding strand, or to only a 
portion thereof. 

[0113] An antisense nucleic acid can be designed such that 
it is complementary to the entire coding region of elongin A3 
mRNA, but more preferably is an oligonucleotide which is 
antisense to only a portion of the coding or noncoding region 
of elongin A3 mRNA. For example, the antisense oligo- 
nucleotide can be complementary to the region surrounding 
the translation start site of elongin A3 mRNA, e.g., between 
the -10 and +10 regions of the target gene nucleotide 
sequence of interest. An antisense oligonucleotide can be, 
for example, about 7, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 
60, 65, 70, 75, 80, or more nucleotides in length. 

[0114] The invention also provides detectably labeled oli- 
gonucleotide primer and probe molecules. Typically, such 
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labels are chemiluminescent, fluorescent, radioactive, or 
calorimetric. The invention also includes molecular beacon 
oligonucleotide primer and probe molecules having at least 
one region which is complementary to a elongin A3 nucleic 
acid of the invention, two complementary regions one 
having a fluorophore and one a quencher such that the 
molecular beacon is useful for quantitating the presence of 
the elongin A3 nucleic acid of the invention in a sample. 
Molecular beacon nucleic acids are described, for example, 
in Lizardi et al., U.S. Pat. No. 5,854,033; Nazarenko et al., 
U.S. Pat. No. 5,866,336, and Livak et al., U.S. Pat. No. 
5,876,930. 

[0115] In another aspect, the invention features, an iso- 
lated elongin A3 protein, or fragment, e.g., a biologically 
active portion, for use as immunogens or antigens to raise or 
test (or more generally to bind) anti-elongin A3 antibodies. 
Elongin A3 protein can be isolated from cells or tissue 
sources using standard protein purification techniques. Elon- 
gin A3 protein or fragments thereof can be produced by 
recombinant DNA techniques or S5mthesized chemically. 

[0116] Polypeptides of the invention include those which 
arise as a result of the existence of multiple genes, alterna- 
tive transcription events, alternative RNA splicing events, 
and alternative translational and post-translational events. 
The polypeptide can be expressed in systems, e.g., cultured 
cells, which result in substantially the same post-transla- 
tional modifications present when expressed the polypeptide 
is expressed in a native cell, or in systems which result in the 
alteration or omission of post-translational modifications, 
e.g., glycosylation or cleavage, present when expressed in a 
native cell. 

[0117] In an embodiment, a elongin A3 polypeptide has a 
molecular weight, e.g., a deduced molecular weight, pref- 
erably ignoring any contribution of post translational modi- 
fications, amino acid composition or other physical charac- 
teristic of SEQ ID NO: 2 or it has an overall sequence 
similarity of at least 50%, preferably at least 60%, more 
preferably at least 70, 80, 90, or 95%, with a polypeptide a 
of SEQ ID NO: 2. 

[0118] In another aspect, the invention provides an anti- 
elongin A3 antibody, or a fragment thereof (e.g., an antigen- 
binding fragment thereof). The term "antibody" as used 
herein refers to an immunoglobulin molecule or immuno- 
logically active portion thereof, i.e., an antigen-binding 
portion. As used herein, the term "antibody" refers to a 
protein comprising at least one, and preferably two, heavy 
(H) chain variable regions (abbreviated herein as VH), and 
at least one and preferably two light (L) chain variable 
regions (abbreviated herein as VL). The anti-elongin A3 
antibody can be a polyclonal or a monoclonal antibody. In 
other embodiments, the antibody can be recombinantly 
produced, e.g., produced by phage display or by combina- 
torial methods. 

[0119] In embodiments an antibody can be made by 
immunizing with purified elongin A3 antigen, or a fragment 
thereof, e.g., a fragment described herein, membrane asso- 
ciated antigen, tissue, e.g., crude tissue preparations, whole 
cells, preferably living cells, lysed cells, or cell fractions, 
e.g., membrane fractions. A full-length elongin A3 protein 
or, antigenic peptide fragment of elongin A3 can be used as 
an immunogen or can be used to identify anti-elongin A3 
antibodies made with other immunogens, e.g., cells, mem- 



brane preparations, and the like. The antigenic peptide of 
elongin A3 should include at least 8 amino acid residues of 
the amino acid sequence shown in SEQ ID NO: 2 and 
encompasses an epitope of elongin A3. Preferably, the 
antigenic peptide includes at least 10 amino acid residues, 
more preferably at least 15 amino acid residues, even more 
preferably at least 20 amino acid residues, and most pref- 
erably at least 30 amino acid residues. 

[0120] An anti-elongin A3 antibody (e.g., monoclonal 
antibody) can be used to isolate elongin A3 by standard 
techniques, such as affinity chromatography or immunopre- 
cipitation. Moreover, an anti-elongin A3 antibody can be 
used to detect elongin A3 protein (e.g., in a cellular lysate or 
cell supernatant) in order to evaluate the abundance and 
pattern of expression of the protein. Anti-elongin A3 anti- 
bodies can be used diagnostically to monitor protein levels 
in tissue as part of a clinical testing procedure, e.g., to 
determine the efficacy of a given treatment regimen. 

[0121] The invention also includes cell lines, e.g., hybri- 
domas, which make an anti-elongin A3 antibody, e.g., and 
antibody described herein, and method of using said cells to 
make a elongin A3 antibody. 

[0122] In another aspect, the invention includes, vectors, 

preferably expression vectors, containing a nucleic acid 
encoding an elongin A3 polypeptide, preferably the elongin 
A3 polypeptide of SEQ ID NO: 2. The vector can be capable 
of autonomous replication or it can integrate into a host 
DNA. Viral vectors include, e.g., replication defective ret- 
roviruses, adenoviruses and adeno -associated viruses. 

[0123] A vector can include a elongin A3 nucleic acid in 
a form suitable for expression of the nucleic acid in a host 
cell. Preferably the recombinant expression vector includes 
one or more regulatory sequences operatively linked to the 
nucleic acid sequence to be expressed. The term "regulatory 
sequence"' includes promoters, enhancers and other expres- 
sion control elements (e.g., polyadenylation signals). Regu- 
latory sequences include those which direct constitutive 
expression of a nucleotide sequence, as well as tissue- 
specific regulatory and/or inducible sequences. The design 
of the expression vector can depend on such factors as the 
choice of the host cell to be transformed, the level of 
expression of protein desired, and the like. The expression 
vectors of the invention can be introduced into host cells to 
thereby produce proteins or polypeptides, including fusion 
proteins or polypeptides, encoded by nucleic acids as 
described herein (e.g., elongin A3 proteins, mutant forms of 
elongin A3 proteins, fusion proteins, and the like). 

[0124] The recombinant expression vectors of the inven- 
tion can be designed for expression of elongin A3 proteins 
in prokaryotic or eukaryotic cells. For example, polypep- 
tides of the invention can be expressed in E. coli, insect cells 
(e.g., using baculovirus expression vectors), yeast cells or 
mammalian cells. Suitable host cells are discussed further in 
Goeddel, (1990) Gene Expression Technology: Methods in 
Enzymology 185, Academic Press, San Diego, Calif. Alter- 
natively, the recombinant expression vector can be tran- 
scribed and translated in vitro, for example using T7 pro- 
moter regulatory sequences and T7 polymerase. 

[0125] Expression of proteins in prokaryotes is most often 
carried out in E. coli with vectors containing constitutive or 
inducible promoters directing the expression of either fusion 
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or non-fusion proteins. Fusion vectors add a number of 
amino acids to a protein encoded therein, usually to the 
amino terminus of the recombinant protein. Such fusion 
vectors typically serve three purposes: 1) to increase expres- 
sion of recombinant protein; 2) to increase the solubility of 
the recombinant protein; and 3) to aid in the purification of 
the recombinant protein by acting as a ligand in affinity 
purification. Often, a proteolytic cleavage site is introduced 
at the junction of the fusion moiety and the recombinant 
protein to enable separation of the recombinant protein from 
the fusion moiety subsequent to purification of the fusion 
protein. Such enzymes, and their cognate recognition 
sequences, include Factor Xa, thrombin and enterokinase. 
Typical fusion expression vectors include pGEX (Pharmacia 
Biotech Inc; Smith, D. B. and Johnson, K. S. (1988) Gene 
67:31-40), pMAL (New England iolabs, Beverly, Mass.) and 
pRIT5 (Pharmacia, Piscataway, N.J.) which fuse glutathione 
S-transferase (GST), maltose E binding protein, or protein 
A, respectively, to the target recombinant protein. 

[0126] The elongin A3 expression vector can be a yeast 
expression vector, a vector for expression in insect cells, 
e.g., a baculovirus expression vector or a vector suitable for 
expression in mammalian cells. When used in mammalian 
cells, the expression vector's control functions can be pro- 
vided by viral regulatory elements. For example, commonly 
used promoters are derived from polyoma. Adenovirus 2, 
cytomegalovirus and Simian Virus 40. 

[0127] Another aspect the invention provides a host cell 
which includes a nucleic acid molecule described herein, 
e.g., a elongin A3 nucleic acid molecule within a recombi- 
nant expression vector or a elongin A3 nucleic acid molecule 
containing sequences which allow it to homologously 
recombine into a specific site of the host cell's genome. The 
terms "host cell" and "recombinant host cell" are used 
interchangeably herein. Such terms refer not only to the 
particular subject cell but to the progeny or potential prog- 
eny of such a cell. Because certain modifications may occur 
in succeeding generations due to either mutation or envi- 
ronmental influences, such progeny may not, in fact, be 
identical to the parent cell, but are still included within the 
scope of the term as used herein. 

[0128] A host cell can be any prokaryotic or eukaryotic 
cell. For example, a elongin A3 protein can be expressed in 
bacterial cells (such as E. coli), insect cells, yeast or mam- 
malian cells (such as Chinese hamster ovary cells (CHO) or 
COS cells (African green monkey kidney cells CV-1 origin 
SV40 cells; Gluzman (1981) CellI23:175-182)). Other suit- 
able host cells are known to those skilled in the art. 

[0129] Vector DNA can be introduced into host cells via 
conventional transformation or transfection techniques. As 
used herein, the terms "transformation" and "transfection** 
are intended to refer to a variety of art- re cognized tech- 
niques for introducing foreign nucleic acid (e.g., DNA) into 
a host cell, including calcium phosphate or calcium chloride 
co-precipitation, DEAE-dextran-mediated transfection, 
lipofection, or electroporation. 

[0130] A host cell of the invention can be used to produce 
(i.e., express) an elongin A3 protein. Accordingly, the inven- 
tion further provides methods for producing an elongin A3 
protein using the host cells of the invention. In one embodi- 
ment, the method includes culturing the host cell of the 
invention (into which a recombinant expression vector 



encoding a elongin A3 protein has been introduced) in a 
suitable medium such that an elongin A3 protein is pro- 
duced. In another embodiment, the method further includes 
isolating an elongin A3 protein from the medium or the host 
cell. 

[0131] In another aspect, the invention features, a cell or 
purified preparation of cells which include a elongin A3 
transgene, or which otherwise misexpress elongin A3. The 
cell preparation can consist of human or non-human cells, 
e.g., rodent cells, e.g., mouse or rat cells, rabbit cells, or pig 
cells. In embodiments, the cell or cells include a elongin A3 
transgene, e.g., a heterologous form of a elongin A3, e.g., a 
gene derived from humans (in the case of a non-human cell). 

[0132] The invention provides non-human transgenic ani- 
mals. Such animals are useful for studying the function 
and/or activity of a elongin A3 protein and for identifying 
and/or evaluating modulators of elongin A3 activity. As used 
herein, a "transgenic animal" is a non-human animal, pref- 
erably a mammal, more preferably a rodent such as a rat or 
mouse, in which one or more of the cells of the animal 
includes a transgene. Other examples of transgenic animals 
include non-human primates, sheep, dogs, cows, goats, 
chickens, amphibians, and the like. A transgene is exog- 
enous DNA or a rearrangement, e.g., a deletion of endog- 
enous chromosomal DNA, which preferably is integrated 
into or occurs in the genome of the cells of a transgenic 
animal. A transgene can direct the expression of an encoded 
gene product in one or more cell types or tissues of the 
transgenic animal, other transgenes, e.g., a knockout, reduce 
expression. Thus, a transgenic animal can be one in which 
an endogenous elongin A3 gene has been altered by, e.g., by 
homologous recombination between the endogenous gene 
and an exogenous DNA molecule introduced into a cell of 
the animal, e.g., an embryonic cell of the animal, prior to 
development of the animal. 

[0133] Elongin A3 proteins or polypeptides can be 
expressed in transgenic animals or plants, e.g., a nucleic acid 
encoding the protein or polypeptide can be introduced into 
the genome of an animal. In embodiments the nucleic acid 
is placed under the control of a tissue specific promoter, e.g., 
a milk or egg specific promoter, and recovered from the milk 
or eggs produced by the animal. Suitable animals are mice, 
pigs, cows, goats, and sheep. 

[0134] The invention also provides a method for identi- 
fying the DNA methylation status at a cytosine residue of a 
CpG sequence in a genomic DNA sample from the subject, 
wherein hypomethylation of CpG sequences compared to a 
methylated DNA control sample is indicative of a disease 
present within the subject. In one aspect, the method is 
characterized in that a set of CpG positions comprises at 
least 3 CpG positions that are located in the regulatory 
region of the same gene. In another aspect, the method is 
characterized in that the methylation state of at least 3 
diflerent sets of CpG positions is identified. 

[0135] The following examples are intended to illustrate 
but not limit the invention. 

EXAMPLE 1 

[0136] Isolation of Normally Methylated CpG Islands or 
GC Rich Regions from a Genome -Wide Screen 

[0137] This example illustrates a method for isolating 
normally methylated CpG islands or GC rich regions in a 
genome-wide screen. 
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[0138] Isolation and Identification of Methylated CpG 
Islands or GC Rich Regions from Genomic DNA 

[0139] A two-step cloning procedure was used for isolat- 
ing and identifying methylated CpG islands or GC rich 
regions from genomic DNA. In the first step, 200 /^g of 
genomic DNA were digested overnight with 1000 units of 
Hpa II (LTI) followed by a 5-h digest with 600 units of Mse 
I (NEB), according to the manufacturer's conditions, and the 
volume was reduced using a SpeedVac concentrator 
(Savant). Fragments 1 kb were size selected using a Chro- 
maspin+TE, 400 column (Clontech), and fragments between 
1-9 kb were purified from a 0.8% gel by electroelution and 
an Elutip-D column (S&S). The eluate was ethanol precipi- 
tated, cloned into the compatible Nde I site of pGEM-4Z, 
which was first modified to abolish the Sma I site, trans- 
formed into the competent cells of the restriction-deficient 
strain XL2-Blue MRF (Stratagene), and plated onto LB- 
ampicillin agar plates. Library DNA was prepared directly 
from plates using a plasmid Maxi kit (Qiagen). In the second 
step, 100 //g of the Mse I library DNA were digested with 
1,000 U of Eag I (NEB) according to the manufacturer's 
conditions. The digest was ethanol precipitated, and 100- 
1500-bp fragments were size-selected by purification from a 
1.5% agarose gel, cloned into the Eag I site of pBC (Strat- 
agene), and transformed into XLl -Blue MRF' (Stratagene). 

[0140] DNA Sequencing 

[0141] DNA sequencing was performed using an ABI 377 
automated sequencer following protocols recommended by 
the manufacturer (Perkin-Elmer). The sequences were ana- 
lyzed by BLAST search (Altschul et al. 1990 ) of the 
GenBank and Cetera databases. 

[0142] Southern Hybridization 

[0143] Genomic DNA was digested with Mse I alone or 
Mse I together with a methylcytosine-sensitive (Hpa II, LTI, 
or Sma I, NEB) or methyl-insensitive (Msp I or Xma I, 
NEB) restriction endonuclease according to the manufac- 
turer's conditions. Southern hybridization was performed as 
described (Dyson 1991). 

[0144] Imprinting Analysis of Elongin A3 Gene 

[0145] Fetal tissues and matched maternal decidua were 
obtained from the University of Washington Fetal Tissue 
Bank. We identified polymorphisms by sequencing fetal and 
maternal PCR amplified genomic DNA. The following 
conditions were used for PCR amplifications: 95° C, 2 min; 
then 40 cycles of 95° C. 1 min, 60° C. 30 sec, 72° C. min; 
then 72° C. for 9 min. Total RNA was isolated from fetal 
tissues using RNeasy mini kit (Qiagen). To eliminate DNA 
contcimination from RNA preparations, samples were treated 
with pre amplification-grade DNase I (Invitrogen) according 
to supplied protocols. RT-PCR was carried out using the 
Superscript II preamplification system (Invitrogen) and was 
performed for each sample in the presence and absence 
(negative controls) of RT. cDNA samples were sequenced 
only when no bands were obtained with the negative con- 
trols. The primers used for the imprinting analysis were 
EL2AL-1093-1112F: 5'-TCT GCT GTC CGC TTT TGA 
GG-3' (SEQ ID NO:32) and EL2AL-1526-1550R: 5 - AFC 
GGATTTTCG TGG TCACTACTC G-3' (SEQ ID NO:33). 
DNA and cDNA sequencing was run on an ABI-377 auto- 
mated sequencer following protocols recommended by the 
manufacturer (Perkin-Elmer). 



[0146] Isolation of Normally Methylated CpG Islands or 
GC Rich Regions 

[0147] A restriction enzyme-based strategy was chosen for 
isolating methylated CpG islands over a PCR-based strat- 
egy, to avoid known problems of amplification bias against 
GC-rich sequences, and to obtain larger clone inserts than 
would be possible by a PCR-based approach. DNA from 
tissue from a male was used, to avoid cloning methylated 
CpG islands from the inactive X chromosome, and to avoid 
cell culture -induced DNA methylation. The tissue chosen 
was a Wilms tumor, because this approach would identify 
either normally methylated CpG islands or those methylated 
specifically in this tumor, which is of interest to our labo- 
ratory. The plan was to determine after cloning these 
sequences whether they were methylated in normal cells or 
in tumors. The first step of the approach (FIG. 1) involved 
double digestion with Mse I, which recognizes the sequence 
TTAAand Hpa II, which recognizes the sequence CCGG at 
unmethylated sites. Mse I digests DNA between CpG 
islands, and Hpa II digests unmethylated CpG islands into 
small fragments, as it has a 4-bp recognition sequence. 
These digestions were followed by gel purification of frag- 
ments >1 kb in length. These initial digestions and purifi- 
cation were predicted by computer analysis of GenBank to 
enrich ~1 0-fold for CpG islands, and enrichment of known 
methylated CpG islands (near imprinted genes) was con- 
firmed by Southern blot hybridization. At the same time, this 
step eliminates all unmethylated CpG islands because of the 
methylcytosine sensitivity of Hpa 11. The restriction frag- 
ments obtained by this first step then were cloned into the 
restriction-negative strain XL2-Blue MRF' to avoid bacterial 
digestion of methylated genomic DNA, and the resulting 
genomic library was termed the "Mse library." The second 
cloning step (FIG. 1) involved further enrichment of CpG 
islands by digesting the purified Mse I library DNA with an 
infrequently cutting restriction endonucleases (i.e., recog- 
nizing 6 bp CG-rich sequences) specific for sequences 
common to CpG islands, to isolate relatively large fragments 
of CpG islands that are normally methylated (i.e., survived 
the first cloning step), but are now unmethylated in the Mse 
library and therefore amenable to digestion and subcloning. 
Most of the work described here was performed by using 
Eag I (recognition sequence CGGCCG) in this second step, 
and subcloning Eag I fragments in three size classes sepa- 
rated by agarose gel electrophoresis (100-500 bp, 500-1000 
bp, >1000 bp), and the resulting library was termed the Eag 
library. 

[0148] Methylated CpG Islands Within Interspersed 

Repeats 

[0149] The primary goal was to identify unique methy- 
lated CpG islands throughout the genome. However, it 
quickly became apparent that most of the clones in the Eag 
library represented high copy number methylated CpG 
islands. The majority of these clones were derived from a 
sequence termed SVA, which constituted 70% of the Eag I 
library, and was not previously known to be methylated. The 
little -known SVA retroposon contains a GC-rich VNTR 
region, which embodies a CpG rich region between an 
Alu-de rived region and an LTR-de rived region. Only three 
such elements had previously been described (Kawajiri et al. 
1986; Zhu et al. 1992; Shen 1994), although their methyla- 
tion has not been characterized. A probe termed SVA-U was 
designed, which was unique to the SVA and present in all of 
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the SVA elements, to analyze copy number and methylation 
of this sequence in genomic DNA. The copy number was 
estimated by quantitative Southern hybridization to be 5000 
per haploid genome. The SVA elements were found to be 
completely methylated in all adult somatic tissues examined, 
including peripheral blood lymphocytes, kidney, adrenal, 
liver and lung. A somewhat less abundant high copy repeat, 
representing an additional 20% of the Eag I library, corre- 
sponded to the nontranscribed intergenic spacer of riboso- 
mal DNA, which was a known methylated repetitive 
sequence (Brock and Bird 1997), suggesting that ribosomal 
gene methylation may be more extensive than was previ- 
ously suspected. The focus of the current study was on the 
unique methylated CpG islands that were identified after 
excluding these sequences. 

EXAMPLE 2 

[0150] Methylation Analysis of Novel Single Copy CpG 
Islands or GC Rich Regions 

[0151] This example illustrates that the methods of the 
present invention for identifying and isolating methylated 
CpG islands or GC rich regions are effective for identifying 
imprinted genes. 

[0152] Isolation and identification of methylated CpG 
islands from genomic DNA was performed as described in 
Example 1, except that to eHminate methylated CpG islands 
that corresponded to dispersed repetitive sequences, the Mse 
I library was derived by adding restriction enzymes designed 
to cleave those sequences and render them unclonable. For 
28S and ribosomal DNA, we used Asc I. For SVA, we used 
Dra Ill+Sal I, followed by either Acc I or Tthllll. 

[0153] To isolate single-copy clones, we re -derived the 
Mse library, adding restriction endonucleases designed to 
cleave repeat sequences described above, rendering them 
unclonable (see Methods). After eliminating redundant 
clones, 62 unique clones were characterized. All of the 
sequences were GC-rich, i.e., with a measured (C+G)/ 
N>50%, and they ranged in GC content from 55 to 79%. 
Forty-three (69%) of the clones showed an observed to 
expected CpG ratio >0.6, meeting the formal definitional 



requirement of a CpG rich region, and they were character- 
ized further. Nevertheless, most of the remaining clones 
showed an observed to expected CpG ratio >0.5. 

[0154] As the original source of DNA was a Wilm 's tumor, 
we had no a priori knowledge of the methylation status of 
these sequences in normal tissue. Surprisingly, all of the 
sequences were methylated in normal lymphocyte DNA 
(FIG. 2A). Methylation was not restricted to lymphocyte 
DNA, as it also was observed in both adult and fetal tissues, 
including brain, gut, kidney, liver, lung, and skin (FIG. 2B). 
Thus, these sequences represented normally methylated 
CpG islands. 

[0155] To determine whether the CpG islands were dif- 
ferentially methylated in the maternal and paternal germline, 
30 of the clones were individually hybridized to Southern 
blots of DNA isolated from ovarian teratomas (OT) and 
complete hydatidiform moles (CHM), which are of unipa- 
rental maternal and paternal origin, respectively (CHM 
DNA was exhausted at that point). Thirteen clones exhibited 
methylation in the OT but not or significantly less so in the 
CHM (Table 1). For example, CpG rich region 2-78 showed 
complete digestion after Hpa II treatment of genomic DNA 
isolated from sperm and CHM, similar to the pattern after 
Msp I digestion (FIG. 3A). In contrast, 2-78 showed an 
identical pattern after Mse I+Hpa II digestion, as after Mse 
I alone, in OT (FIG. 3A). Similarly, FIG. 3A shows 
OT-specific methylation of CpG islands 3-30, 1-13, 4-6, and 
2-48, with relative lack of methylation in CHM. These 
sequences therefore represent differentially methylated 
regions, because of their different pattern of methylation in 
germline tissues of male (sperm and CHM) and female (OT) 
origin. Because many of these sequences also are methy- 
lated in somatic tissues, we refer to them as gDMR's 
(germline differentially methylated regions). All of the 
gDMR sequences were methylated in OT and not CHM. As 
a negative control, a CpG rich region associated with the RB 
gene (retinoblastoma) is unmethylated in both CHM and OT. 
As a positive control, a CpG rich region upstream of the 
imprinted gene HI 9 is preferentially methylated in CHM, 
and a CpG rich region within the imprinted SNRPN gene is 
methylated in OT (FIG. 3B). 



TABLE 1 



Methylated CpG Islands Characterized In Detail 



SEQ. 

ID Methylation Chromosomal 



Associated genes 



Clone ID 


No. 


pattern 


location 


Accession no. 


Predicted function 


3-10 


3 


SMR 


lq44 


14042537* 
14423768 
hCGl 644736 
hCGl 724357 


Similar to Zn finger protein 
Olfactory receptor 2T1 

Similar to olfactory receptor 271 
Similar to olfactory receptor 2T1 


3-20 


4 


SMR 


2q36 


5174481^ 
hCGl 651464 
hCG1656118 
hCG1651461 
hCGl 651466 


His tone deacetylase A 


1-19 


5 


SMR 


4pl6 


37775830* 


WFSl (wolframin) 


1-41 


6 


SMR 


4q35 


hCG1788598* 
hCG1793025* 
hCG1787540 


Similar to mouse pair-rule gene ODZ3 

Hypothetical protein 

Similar to mouse pair-rule gene ODZ3 


4-8 


7 


SMR 


4q35 


hCGl 788598* 


Adjacent to but distinct from 1—41 


3-4 


8 


gDMR 


6q24 


hCG1660630* 


HYMAl 
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TABLE 1 -continued 



Methylated CpG Islands Characterized In Detail 

SEQ. 



Clone ID 


ID 
No. 


Methylation 
pattern 


Chromosomal 
location 




Associated genes 


Accession no. 


Predicted function 


4-7 


9 


SMR 


7p22 


6806913^ 


Centaurin-a 










hCG1747708 












hCG1790856 












hCGl 747710 


Cytochrome P450 homolog 


1-30 


10 


SMR 


7qll.l 


hCGl 779529^ 


Rab5 exchange factor homolog 










hCG1779527 


Antisense to hCG1779529 










hCG1789113 


Antisense to hCG1779529 










13642872 


60S ribosomal prot L35 










6572672 


Putative transcription factor 


1-22 


11 


SMR 


7q36 


11386149^ 


Tyrosine phospl^tase 










hCG1799787 


BHLFl protein 


3-2 


12 


SMR 


8p23 


hCGl 659058^ 


Proline-rich mucin homolog 


2-5 


13 


gDMR 


8q21.2 


17451956 


Similar to antigen GOR 










hCG1757665 




2-48 


14 


gDMR 


9pl3 


hCG1659616 




1-20 


15 


gDMR 


10q26 


13325182a 












hCG1799063 




3-12 


16 


SMR 


10q26 


3122245^ 


Inositol triphosphate phosphatase 










hCGl 654478 




3-30 


17 


gDMR 


llq25 


17456499^ 


Hypothetical gene 










hCG37607 












hCGl 745526 


H5rpothetical protein 


1-5 


18 


SMR 


13q34 


hCG20146^ 




1-21 


19 


gDMR 


14q32 


8393715 


Heptacellular cancer candidate 










hCG21408 


Similar to Drosophila CLIP-190 


2-42 


20 


gDMR 


16pl3.1 


hCG15669^ 


SeryThr protein kinase 


3-110 


21 


SMR 


17q25 


8400736^ 


^tubulin cofactor D 










10435982 




2-1 


22 


SMR 


17q25 


1655842a 


Sulfamidase 








14149793 


Coiled-coil protein 


1-12 


23 


SMR 


17q25 


hCG1806389 










hCG1796817 




2-78 


24 


gDMR 


18q21 


13645769^ 


Elongin A3 


3-8 


25 


gDMR 


18q21 


13645769^ 


Adjacent to but distince from 2—78 


2-3 


26 


SMR 


18q23 


6912444 


Voltage-dependent K-i- channel 










1914872 


Choline-binding protein 










5326898 


RNA Polymerase II CTD phosphatase 


1-13 


27 


gDMR 


18q23 


hCG1651089 












C50i5<3Z41 


SALL3 Sp alt-luce zmc finger pro tern 










hCG20372 


ATPase 










1651088 


Mucinl precursor 


1-6 


28 


gDMR 


19pl3.1 


14249150^ 


Hypothetical protein 


4-3 


29 


SMR 


19pl3.1 


9665054^ 


Ser/Thr kinase 11 










hCG23965 


Ankyrin repeat protein 










4506715 


Ribosomal protein S28 










hCGl 794585 










10732648 


Angiopoietin-like protein 


1-2 


30 


SMR 


19ql3.4 


12053197 


Zinc finger protein 










4505329 


NSF attachment protein 










5901994 


Kaptin actin binding protein 










5689511 


Na/Ca exchanger protein 










7657128 


Glioma tumor suppressor candidate 










7657054 


EH-Domain containing protein 










7657130 


Glioma tumor suppressor candidate 


2-4 


31 


gDMR 


20ql2 


17484155^ 


Nuclear factor of activated T-cells 2 










hCG1800975 












hCGl 653833 












13378306 


Brain RPTmam4 isoform 










110743 


Neurofilament triplet H protein 










7799072 


DNA helicase 



Accession numbers correspond to GenBank entries, within 10 kb of the CpG island, unless there is no Gen- 
Bank entry, in which case correspond to Celera entries. One additional SMR could not be mapped. 
^CpG island lies within the transcript. 



[0156] An additional 17 clones identified CpG islands that showed an identical pattern after Mse I+Hpa II digestion, as 
were methylated equally in OT, CHM, and sperm (Table 1). after Mse I alone, in OT and CHM (FIG. 4). We termed 
For example, CpG islands 3-110, 3-10, 2-1, and 1-41 these sequences SMRs, to connote their comparable me thy- 
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lation in male and female tissue of germline origin. Like the 
gDMRs, these SMRs were methylated in cells of somatic 
origin (FIG. 2A). 

EXAMPLE 3 

[0157] Chromosomal Location of Methylated CPG Rich 
Regions and Association with Genes 

[0158] This Example shows that many SMRs are located 
near the ends of chromosomes and identifies CpG islands 
isolated herein that reside near known genes. Chromosomal 
locations of the identified CpG islands were determined by 
identifying corresponding Genbank human genomic DNA 
sequences of known genomic location, using well-known 
nucleic acid sequence search tools such as BLAST. 

[0159] The methylated CpG islands identified here were 
distributed throughout the genome. There was a striking 
localization of SMRs near the ends of chromosomes. Six- 
teen of 17 SMRs were localized near the ends of chromo- 
somes, either on the last (n=15) or the penultimate (n=l) 
subband of the chromosome on which it resided (Table 2). 
In contrast, of 12 gDMRs that could be mapped (of the 13 
gDMRs studied), only four were localized near the ends of 
chromosomes (Table 2). This difference was highly statis- 
tically significant (P=0.0008, Fisher's exact test). The asso- 
ciation of SMRs near the ends of chromosomes is consistent 
with an observation of densely methylated GC-rich 
sequences near telomeres, although that study did not 
describe methylated CpG islands (Brock et al. 1999). In 
addition, there was a segregation of gDMRs and SMRs 
within compartments of differing genomic composition, i.e., 
isochores, which are regions of several hundred kilobases of 
relatively homogeneous GC composition (Bemardi 1995). 
Approximately 75% of the SMRs fell within high isochore 
regions (G+C 50%), as might be expected from the high GC 
content of methylated CpG islands. Surprisingly, however, 
all of the gDMRs fell within low isochore regions 
(G+C<50%), i.e., of relatively low GC content, despite the 
high GC content of the gDMRs themselves (L. Z. Strich- 
man-Almashanu and A. P. Feinberg). This difference was 
statistically significant (P<0.01, Fisher's exact test). Thus, 
the gDMRs and SMRs may lie within distinct chromosomal 
and/or isochore compartments. These results provide the 
basis for a method to identffy epigenetic chromosomal 
domains. Localization of CpG islands to the telo/subtelo 
regions, for example, can be used for identifying imprinted 
gene domains, disease domains (e.g. pi 6), chromatin regu- 
lated genes controlled at a distance, such as telomerase 
(TERT) or c-myc by CTCF; and developmentally pro- 
grammed regions essential for organ formation, such as the 
brain in Lunyak et al. Science. Oct. 24, 2002 for example. 



TABLE 


2 




Band Location of Methylated CpG Rich regions 




Band Location 




CpG Rich region Centromeric 


Midchromosome 


Telomeric 


GDMR 0 


8 


4 


SMR 1 


0 


16 



[0160] There were several examples of nonredundant, 
unique methylated CpG islands localizing to the same 



chromosomal region. In two cases, two pairs of sequences 
were adjacent within the genome. Two SMRs on 4q35, 1-41 
and 4-8, were adjacent to each other; and two gDMRs on 
18q21, 2-78 and 3-8, also were adjacent to each other (Table 
1). In addition, 14 methylated CpG islands were located near 
and on the same chromosomal subband as other methylated 
CpG islands (Table 1). For example, SMRs 3-110, 2-1, and 
1-12 are all on 17q25; two of these sequences, 3-110 and 

1- 12, lie within 660 kb. In some cases, SMRs and gDMRs 
were found in relatively close proximity. For example, SMR 

2- 3 and gDMR 1-13 he within 1 Mb on 18q23. In addition, 
gDMR 1-20 and SMR 3-12 are both on 10q26 and separated 
by —800 kb (Table 1). All of these data together support the 
idea that these methylated CpG islands identify specific 
portions of the genome. 

[0161] Most of the methylated CpG islands were localized 
within or near the coding sequence of known genes or of 
anonymous ESTs within the GenBank or Celera databases. 
Because of the known ability of DMRs to regulate imprint- 
ing over long distances (reviewed in Feinberg 2001), the 
identity of known or predicted genes within several hundred 
kilobases of each methylated CpG rich region, was deter- 
mined. Particularly intriguing was the discovery that gDMR 

3- 4 was located on 6q24 within HYMAl ^IG. 5), an 
imprinted gene involved in diabetes mellitus (Arima et al. 
2000). This CpG rich region has been identified indepen- 
dently as a DMR, in a specific analysis of this gene (Arima 
et al. 2001), and isolation of this sequence using a method 
of the present invention indicates that these methylated CpG 
islands may identify imprinted gene domains. gDMR 1-13 
was located on 18q23, within a predicted gene of unknown 
function, and near the SALL3 gene (FIG. 5), which encodes 
a Sp alt-like zinc finger protein that is a candidate gene for 
18q deletion syndrome (10610715), which involves prefer- 
ential loss of the paternal allele (Kohlhase et al. 1999). 
Interestingly, 18q23 also has been implicated in bipolar 
affective disorder, specifically harboring a predisposing gene 
transmitted preferentially through the father (Stine et al. 
1995; McMahon et al. 1997). Therefore, the localization of 
this gDMR may serve as a guidepost for identifjdng candi- 
date imprinted genes for this important disease. SMR 1-2 
was located within 19q 13.4 (FIG. 5). Even though this 
sequence is an SMR, 19ql3.4 contains the imprinted genes 
PEG3 and ZIMl (Kim et al. 1999). Given that SMR 1-2 is 
—10 Mb from these genes, it is unlikely to lie within the same 
imprinted gene domain. Nevertheless, it will be of interest to 
examine nearby genes for their imprinting status, including 
a glioma tumor suppressor candidate gene located 110 kb 
telomeric to SMR 1-2. Another interesting gene harboring a 
methylated CpG rich region was histone deacetylase A 
(HDAC4), and there were several other predicted genes near 
this CpG rich region, SMR 3-20 (FIG. 5). In addition, 
several antisense transcripts are associated with this CpG 
rich region. Given that HDAC4 is itself involved in chro- 
matin remodeling (Wang et al. 2000), methylation of this 
region could be involved in a feedback loop controlling 
chromatin structure. Other genes located near methylated 
CpG islands included the wolframin gene, a transmembrane 
protein involved in congenital diabetes (Strom et al. 1998); 
several olfactory receptor genes; several phosphatase and 
kinase genes likely involved in signal transduction; several 
genes for DNA-inter acting proteins; and the Peutz-Jeghers 
syndrome gene STKll (Table 1). A voltage -dependent 
potassium channel subunit protein was localized only 16 kb 
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from methylated CpG rich region 2-3 (Table 1), which is of 
interest given that the voltage-dependent potassium channel 
KvLQTl is imprinted (Lee et al. 1997). Finally, in addition 
to genes directly adjacent to these methylated CpG islands, 
at least two of the domains flanked by methylated CpG 
islands harbored several genes within them that may play a 
role in cancer. For example, contained within the region 
defined by methylated CpG islands 3-110 and 1-12 are a 
predicted apoptosis inhibitor, a septin-like cell division 
gene, a ras homo log, and a predicted translation initiation 
factor (Table 1). 

EXAMPLE 4 

[0162] Identification of an Imprinted Gene Homologous to 
Elongin A 

[0163] This example illustrates the use of the methods of 
the present invention for identifying novel genes associated 
with CpG islands. More specificaUy, this example illustrates 
the use of the methods of the present invention to identify 
the Elongin A gene. 

[0164] Imprinting Analysis of Elongin A3 Gene 

[0165] Fetal tissues and matched maternal decidua were 
obtained from the University of Washington Fetal Tissue 
Bank. We identified polymorphisms by sequencing fetal and 
maternal PCR amplified genomic DNA. The following 
conditions were used for PCR amplifications: 95° C, 2 min; 
then 40 cycles of 95° C. 1 min, 60° C. 30 sec, 72° C. min; 
then 72° C. for 9 min. Total RNA was isolated from fetal 
tissues using RNeasy mini kit (Qiagen). To eliminate DNA 
contamination from RNApreparations, samples were treated 
with pre amplification-grade DNase I (Invitrogen) according 
to supplied protocols. RT-PCR was carried out using the 
Superscript II preamplification system (Invitrogen) and was 
performed for each sample in the presence and absence 
(negative controls) of RT. cDNA samples were sequenced 
only when no bands were obtained with the negative con- 
trols. The primers used for the imprinting analysis were 
EL2AL-1093-1112F: 5'-TCT GCT GTC CGC TTT TGA 
GG-3' (SEQ ID NO: 32) and EL2AL-1526-1550R: 5'-ArC 
GGA TTT TCG TGG TCA CTA CTC G-3' (SEQ ID NO: 
33). DNA and cDNA sequencing was run on an ABI-377 
automated sequencer following protocols recommended by 
the manufacturer (Perkin-Elmer). 

[0166] In addition to HYMAl, described above, a DMR 
within the IGF2R contains an Eag I site, and as predicted, 
this gene also was found in the Eag library. Allele -specific 
expression of genes near methylated islands was examined. 
gDMR 2-78 was localized to 18q21 (FIG. 5) and was 
completely methylated in all somatic fetal and adult tissues 
tested (FIG. 2). However, this CpG rich region was unm- 
ethylated in CHM and sperm and methylated in OT (FIG. 
3A). A BLAST search showed that the CpG rich region 
spanned the putative promoter region and body of a gene 
predicted by GENSCAN (http://genes.mit.edu/GENSCAN), 
and included 1638 nucleotides encoding 546 amino acids 
(FIG. 6). BLAST searches of GenBank and Cetera data- 
bases using the predicted sequences revealed that the pre- 
dicted gene showed 43% amino acid identity to human 
transcription elongation factor B (SIII) polypeptide 3 
(TCEB3), also known as Elongin A. The novel sequence was 
even more closely related to a previously identified homolog 
of Elongin A termed Elongin A2, or TCEB3L, showing 79% 
amino acid sequence identity to human transcription elon- 



gation factor (SIII) Elongin A2 (TCEB3L). To determine 
whether 2-78 represented a genuine transcript, and if so, 
whether the gene is imprinted, primers were designed that 
would amplify 2-78 but not Elongin A2, and amplification 
products were of the expected size. Sequencing demon- 
strated that the amplified cDNA corresponded to 2-78 and 
not Elongin A2, based on sequence differences between the 
two genes within the PCR product. Analyzing DNA samples 
from fetal tissues, we then identified a polymorphism at 
nucleotide 910 (G/A) of 2-78. Four fetuses heterozygous for 
this polymorphism were identified, in which maternal 
decidua DNA was available and homozygous, allowing the 
identification of parental origin in the fetal samples (FIG. 7). 
Reverse transcriptase PCR (RT-PCR) analysis of tissues 
from these fetuses showed that the gene was indeed tran- 
scribed. We therefore term this gene Elongin A3. An alter- 
native term is TCEB3L2, but for this term to apply, the 
nomenclature committee will need to rename TCEB3L 
(Elongin A2) TCEB3L1. 

[0167] Analysis of allele -specific expression showed 
monoallelic expression of lung, brain, placenta, and spinal 
cord, with preferential expression from the maternal allele 
(FIGS. 7A-D). There was incomplete preferential expression 
from the maternal allele in two of three kidneys (FIGS. 7A, 
C), and absence of imprint- specific gene expression in one 
kidney and in the intestine or liver (FIGS. 7B, C, D). Thus, 
Elongin A3 shows tissue-specific imprinting, at least in 
prenatal development. Therefore, the isolation of these novel 
CpG islands does enable the identification of novel human 
imprinted genes. 

EXAMPLE 5 

[0168] Species Conservation of Methylated CpG Rich 

Regions 

[0169] This example illustrates that the CpG islands iden- 
tified herein are conserved among mammalian species and 
can be used to identify nearby regulatory elements con- 
served between species. 

[0170] As further confirmation of the importance of the 
methylated CpG islands that were isolated, their sequence 
conservation in the mouse was ascertained using the Cetera 
mouse genome database. Thirteen (46%) of the 30 human 
noncontiguous methylated CpG islands matched sequences 
within the mouse genome at 86.9+4.9% identity (FIG. 8). 
Furthermore, in some cases, the region of conservation 
extended beyond the CpG rich region itself. For example, 
gDMR 1-21 showed, in addition to a 558 bp, 82% conserved 
region including the CpG rich region, five additional con- 
served sequences within 1 kb of the CpG rich region. These 
additional sequences varied from 80-97% identity (FIG. 8). 
Most of the conserved sequences outside of the CpG islands 
themselves were not predicted genes, and thus may repre- 
sent conserved regulatory sequences. In all cases in which 
BLAST analysis of the CpG rich region and flanking 1 kb on 
each side was performed, and in which any sequence con- 
servation was found, the CpG rich region itself was con- 
served, again supporting the idea that these CpG islands play 
an important role. 

EXAMPLE 6 

[0171] NormaUy Methylated CpG Islands and GC Rich 
Sequences 

[0172] This Example provides further insight into the 
methods of the present invention and the conclusions 
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reported herein. A major conclusion of the previous 
Examples is the identification of a subset of unique CpG 
islands that are methylated in normal tissues, in the first 
systematic effort to identify such sequences. The experi- 
ments were designed to identify CpG islands that are methy- 
lated differentially in germline -derived tissues or differen- 
tially in cancers. However, no CpG islands methylated 
specifically in tumors were found, but slightly more than one 
half of the unique methylated CpG islands were methylated 
in germline -derived tissues of both maternal and paternal 
origin. Conventional wisdom holds that CpG islands are 
unmethylated, with the exception of the inactive X chromo- 
some, imprinted genes, and tumors. However, rare excep- 
tions to this rule have been described. Some repeated 
sequences harboring CpG islands have been found to be 
methylated. Methylation of a mouse testis-specific histone 
H2B gene has been reported (Choi et al. 1996), and others 
have found methylation of some ribosomal gene sequences 
(Brock and Bird 1997). Indeed, methylation of one of these 
repeat sequences, the rDNA nontranscribed spacer, previ- 
ously was found after genomic purification from a methyl- 
CpG binding protein column (Brock and Bird 1997), and the 
large number of these sequences may have obscured the 
identification of unique methylated CpG islands. The methy- 
lation of high copy number sequences is not surprising, as it 
is consistent with the hypothesis that CpG methylation arose 
as a host defense mechanism (Bestor and Tycko 1996). This 
is particularly true of the SVA element, which is a high copy 
number retroposon. 

[0173] However, the presence of normally methylated 
unique CpG islands and GC rich regions has not been 
observed systematically. An intriguing exception is the 
MAGE melanoma gene (Serrano et al. 1996), and it is 
thought that hypomethylation of this gene leads to its 
activation in cancer (De Smet et al. 1996). Our results 
suggest that normally methylated single -copy CpG islands 
and GC rich regions may be more abundant than previously 
believed. Indeed, the loss of methylation of such sequences 
may be related to gene activation in cancer, just as the gain 
of methylation of CpG islands and GC rich regions may lead 
to their silencing. Previous screens for altered CpG rich 
region methylation have not been designed to identify 
normally methylated CpG islands and GC rich regions, but 
it should be noted that the original observation of altered 
methylation in cancer was widespread loss of methylation 
(Feinberg and Vogelstein 1983). Furthermore, even in 
tumors that show increased CpG rich region methylation, 
the total methylation content is reduced (Feinberg et al. 
1988). DNA methylation serves as an additional layer of 
genetic information in the genome, which has been termed 
the methylome (Feinberg 2001), and both increases and 
decreases may be important in cancer. Our strategy for 
cloning these sequences can be generalized to secondary 
libraries in addition to the Eag library, and the identification 
of additional such sequences thus should enhance our under- 
standing of the methylome. 

[0174] Another major result of the above Examples is the 
identification of novel CpG islands and GC rich regions that 
are methylated difterentially in OT and CHM. The second 
(Eag) library would not identify known imprinted genes 
lacking Eag I sites, but it did contain the DMR of IGF2R 
(Wutz and Barlow, Mol. Cell Endocrinol. May 25, 
1988;140(l-2):9-14), as well as the DMR of the imprinted 
HYMAl gene, suggesting that this strategy also can identify 



novel imprinted gene domains. One such gene was identified 
to date, a novel homolog of the Elongin A and Elongin A2 
genes, which we term Elongin A3. Both Elongin A and 
Elongin A2 are known to be the active components of the 
transcription factor B (SIII) complex (Aso et al. 2000), that 
may compete for other components (Elongin B and C) with 
the VHL tumor suppressor gene (Kibel et al. 1995). We did 
not check directly for elongation activity of Elongin A3, but 
it contains the TFS2N motif as well as a nuclear localization 
signal, and the predicted protein sequence is 79% identical 
to that of Elongin A2, so it likely does have such a function. 

[0175] It should be noted that gDMRs, even the gDMR 
within this novel imprinted gene, showed variable to com- 
plete methylation in somatic tissues. Such a pattern of 
methylation also is similar to that seen for the promoter of 
the imprinted gene ZNF127 (Strom et al. 1998), and for at 
least one methylated CpG rich region within the lip 15 
imprinted gene domain. Thus, imprinted gene domains may 
harbor some methylated CpG islands and GC rich regions 
that show persistent differential methylation in somatic 
tissues, but also may contain other CpG islands and GC rich 
regions that do not show these differences in somatic tissues. 
Thus, it is important to compare methylation in sperm or 
CHM as a representation of the male germline, and OT (as 
eggs cannot be harvested from humans for this purpose), in 
the search for imprinted gene domains. The mouse is a 
useful adjunct and provides access to a greater variety of 
tissues at varying developmental stages, but there are sub- 
stantial differences between human and mouse imprinting, 
both in the identity of the genes themselves, and in their 
developmental pattern of imprinting. 

[0176] Several of these domains harbor multiple genes 
that have been implicated in cancer, and that show frequent 
loss of heterozygosity, including 4pl6, 4q35, 10q26, 18q21, 
and 19pl3. An imprinted tumor suppressor gene in one or 
more of these regions might not show conventional muta- 
tions in tumors, and thus identifying imprinted genes is an 
important part of tumor suppressor gene identification 
within these regions. The same region of 18q also has shown 
linkage in bipolar affective disorder, with preferential trans- 
mission through the paternal allele (McMahon et al. 1997). 
Furthermore, these domains appear to harbor both SMRs 
and gDMRs, suggesting that both types of methylated CpG 
islands and GC rich regions may be useful for identifying 
imprinted gene domains. 

[0177] CpG islands and GC rich regions normally must be 
under selective pressure for their maintenance, as methyla- 
tion leads to deamination and loss of c5l;osine. This is 
especially true in the case of the SMRs we have described, 
as they are methylated even in sperm DNA. In the case of 
gDMRs, their methylation in somatic tissues and oocyte- 
derived cells may be critical for suppression of nearby gene 
expression in spermatocyte progenitor cells. This may be 
particularly important for genes involved in establishing 
epigenetic states and in epigenetic reprogramming, as the 
chromatin of spermatocyte differs markedly from oocytes 
and somatic cells. 

[0178] It also is likely that normally methylated CpG 
islands and GC rich regions are involved directly in chro- 
matin formation. For example, they could serve as chroma- 
tin insulators separating enhancers from promoters. If that is 
so, then we would expect to find their loss of methylation in 
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Specific tissues at specific developmental stages, which 
would be consistent with the observation that imprinted 
genes can show developmental (tissue- and timing-specific) 
imprinting (Lee et al. 1997). Support for this idea also comes 
from our observation that SMRs were more frequently 
localized near the ends of chromosomes. Given that chro- 
mosomal ends are associated with the nuclear lamina in 
interphase (Cockell and Gasser 1999), the relative proximity 
of SMRs to the ends of chromosomes might permit their 
association with the nuclear lamina and chromatin proteins 
found within it. 

[0179] Normally methylated CpG islands and GC rich 
regions also might promote chromatin formation. In an 
intriguing review, Pardo -Manuel de Villena et al. (2000) 
suggest that imprinting involving difterences among 
homologous chromosomes arose under selective pressure to 
facilitate pairing and distinguish homologous chromosomes 
during meiosis. We suggest that SMRs also might enhance 
pairing and recombination by recruiting chromatin factors to 
specific locations along a given chromosome and allowing 
those factors to interact between homologous chromosomes. 
A prediction of our suggestion is that recombination fre- 
quencies in meiosis or even mitosis might be enhanced near 
normally methylated CpG islands and GC rich regions. 
Methylated CpG islands and GC rich regions also may play 
a role intrachromosomal compartmentalization. For 
example, the gDMRs lay within regions of comparatively 
lower CpG content (GC-poor isochores). Consistent with 
this idea, we have noted that most known imprinted genes 
also appear to lie within low isochore regions (PLAGLl, 
IGF2R, PEGl/MEST, SNRPN, PEGS, GNAS). 

[0180] Finally, the identification of these methylated CpG 
islands and GC rich regions will facilitate comparison of 
their sequences to each other, as well as computational 
analysis of sequence motifs. For example, in preliminary 
experiments, several CTCF binding sites within at least 10 
methylated CpG islands and GC rich regions have been 
identified. Therefore, CTCF binding may be a common 
feature of these sequences. 

[0181] It has recently been proposed that CpG islands and 
GC rich regions fall into several groups, one of which 
represents unique CpG and generally unmethylated islands 
and GC rich regions associated with the 5' region of house- 
keeping genes, whereas another includes high-copy nongene 
CpG islands and GC rich regions that are dominated by Alu 
I repeat elements (Ponger et al. 2001). Because Alu I repeats 
are generally methylated and transcriptionally silent, high- 
copy CpG islands and GC rich regions are predicted to be 
methylated. Indeed, the report of Strichman-Almashanu et 
al. (2002) identified one of the high-copy CpG islands and 
GC rich regions (SVA) to be heavily methylated. This 
observation is not surprising given that repeat sequences 
provide signatures for de novo methylation, according to the 
host defense model (Bestor and Tycko 1996). 

[0182] Strichman-Almashanu et al. (2002) also report the 
existence of a new class of unique CpG islands and GC rich 
regions that are methylated on both alleles in all tissues 
examined. Interestingly, these CpG islands and GC rich 
regions (SMRs) mapped to isochores with high GC content 
(>0.5), whereas the differentially methylated islands and GC 
rich regions (gDMRs) were concentrated in isochores with 
low GC content (<0.5). The class of unmethylated or dif- 



ferentially methylated CpG islands and GC rich regions 
could stand out in a CpG-less environment and provide 
landmarks for various recognition events, such as the ini- 
tiation of chromatin condensation by TP2 during spermio- 
genesis (Kundu and Rao 1996). The complexity of CpG rich 
region compartmentalization of the mammalian genome was 
further emphasized by the observation that the methylated 
high-copy CpG islands and GC rich regions frequently 
localize close to telomeric ends (Strichman-Almanshanu et 
al. 2002), as do densely methylated nonrich region CpG 
stretches (Brock et al. 1999), indicating some methylation- 
dependent role in chromosomal integrity. This deduction is 
supported by the observation that DNA me thyltransf erase, 
Dnmtl, is essential for genomic stability in mouse embry- 
onic stem cells (Chen et al. 1998). 

TABLE 3 



Chromosomal Location of SMRs and gDMRs 





Clone ID 


Methylation pattern 


Chromosomal location 
Accession no. 


3-10 


SMR 


lq44 TEL 


14042537a 






14423768 








hCGl 644736 








hCGl 724357 


3-20 


SMR 


2q36 SUBTEL 


5174481a 








hCGl 651464 








hCGl 656118 








hCGl 651461 








hCGl 651466 


1-19 


SMR 


4pl6 TEL 


3777583a 


1-41 


SMR 


4q35 TEL 


hCGl 788598a 








hCGl 793025a 








hCGl 787540 


4-8 


SMR 


4q35 TEL 


hCGl 788598a 


3-4 


gDMR 


6q24 SUBTEL 


hCGl 660630a 


4-7 


SMR 


7p22 TEL 


6806913a 








hCGl 747708 








hCGl 790856 








hCGl 747710 


1-30 


SMR 


7qll.l 


hCGl 779529a 






hCG1779527 








hCG1789113 








13642872 








6572672 


1-22 


SMR 


7q36 TEL 


11386149a 








hCGl 799787 


3-2 


SMR 


8p23 TEL 


hCGl 659058a 


2-5 


gDMR 


8q21.2 


17451956 








hCG1757665 


2-48 


gDMR 


9pl3 


hCGl 659616 


1-20 


gDMR 


10q26 TEL 


13325182a 








hCG1799063 


3-12 


SMR 


10q26 TEL 


3122245a 








hCGl 654478 


3-30 


gDMR 


llq25 TEL 


17456499a 








hCG37607 








hCGl 745526 


1-5 


SMR 


13q34 TEL 


hCG20146a 


1-21 


gDMR 


14q32 TEL 


8393715a 








hCG21408 


2-42 


gDMR 


16pl3.1 


hCG15669a 


3-110 


SMR 


17q25 TEL 


8400736a 








10435982 


2-1 


SMR 


17q25 TEL 


1655842a 








14149793 


1-12 


SMR 


17q25 TEL 


hCG1806389 






hCG1796817 


2-78 


gDMR 


18q21 


13645769a 


3-8 


gDMR 


18q21 


13645769a 


2-3 


SMR 


18q23 TEL 


6912444 








1914872 








5326898 
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TABLE 3 -continued 



Chromosomal Location of SMRs and gPMRs 





Clone ID 


Methylation pattern 


Chromosomal location 
Accession no. 


1-13 


gDMR 


18q23 TEL 


hCGl 65 1089a 








6688241 








hCG20372 








1651088 


1-6 


gDMR 


19pl3.1 


14249150a 


4-3 


SMR 


1^13.3 TEL 


9665054a 








hCG239 65 








4506715 








hCGl 794585 








10732648 


1-2 


SMR 


19ql3.4 TEL 


12053197 






4505329 








5901994 








5689511 








7657128 








7657054 








7657130 


2-4 


gDMR 


20ql2 


17484155a 








hCG1800975 
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[0249] Although the invention has been described with 
reference to the above examples, it will be understood that 
modifications and variations are encompassed within the 
spirit and scope of the invention. Accordingly, the invention 
is limited only by the following claims. 

What is claimed is: 

1. A method for determining a disease state in a subject 
comprising: 

determining the DNA methylation status at a cy to sine 
residue of a CpG sequence in a genomic DNA sample 
from the subject, wherein hypomethylation of a CpG 
sequence normally methylated in a subject not having 
the disease state, is indicative of a disease state in the 
subject. 

2. The method of claim 1, wherein the CpG sequence is 
within a GC rich region or a CpG island. 

3. The method of claim 1, wherein the subject is a human. 

4. The method of claim 1, wherein the disease is cancer. 



5. The method of claim 1, wherein the disease is selected 
from cancer, multiple sclerosis, Alzheimer's disease, Par- 
kinson's disease, depression and other imbalances of mental 
stability, atherosclerosis, cystic fibrosis, diabetes, obesity, 
Crohn's disease, and altered circadian rhythmicity, arthritis, 
inflammatory reactions or disorders, psoriasis and other skin 
diseases, autoimmune diseases, allergies, hypertension, 
anxiety disorders, schizophrenia and other psychoses, 
osteoporosis, muscular dystrophy, amyotrophic lateral scle- 
rosis or circadian rhythm-related conditions. 

6. A method for determining the DNA methylation status 
at a cytosine residue of a CpG sequence in a genomic DNA 
sample, comprising performing methylation state analysis of 
one or more CpG islands or GC rich regions of a genomic 
DNA sample, thereby determining the DNA methylation 
status in the genomic DNA sample. 

7. The method of claim 6, wherein the one or more CpG 
islands or GC rich regions include differentiaUy methylated 
regions (DMRs). 

8. The method of claim 6, wherein the one or more CpG 
islands and GC rich regions include similarly methylated 
regions (SMRs). 

9. The method of claim 6, wherein the CpG island or GC 
rich region is selected from the group consisting of SEQ ID 
NOs: 3-7, SEQ ID NO: 9-30 and SEQ ID NO: 31 and any 
combination thereof. 

10. The method of claim 8, wherein the CpG island or GC 
rich region includes at least one of SEQ ID NO: 3, SEQ ID 
NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ 
ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 
12, SEQ ID NO: 16, SEQ ID NO: 21, SEQ ID NO: 22, SEQ 
ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, or SEQ ID 
NO: 30. 

11. The method of claim 8, wherein the CpG island or GC 
rich region includes at least SEQ ID NO: 4. 

12. The method of claim 7, wherein the CpG islands and 
GC rich regions are least one of SEQ ID NO: 13, SEQ ID 
NO: 14, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, 
SEQ ID NO: 20, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID 
NO: 27, SEQ ID NO: 28, and SEQ ID NO: 31. 

13. The method of claim 6, wherein the methylation state 
of at least two CpG islands and GC rich regions is identified. 

14. The method of claim 6, wherein the methylation state 
of at least three CpG islands and GC rich regions is 
identified. 

15. The method of claim 6, wherein the methylation state 
of SEQ ID NO: 27 is identified. 

16. The method of claim 6, wherein the methylation state 
of SEQ ID NO: 30 is identified. 

17. The method of claim 6, wherein the methylation state 
of SEQ ID NO: 24 is identified. 

18. The method of claim 6, wherein the methylation state 
of SEQ ID NO: 8 is identified. 

19. The method of claim 6, wherein the methylation state 
of SEQ ID NO: 4 is identified. 

20. The method of claim 6, wherein the methylation state 
of SEQ ID NO: 26 is identified 

21. The method of claim 6, wherein the methylation state 
of SEQ ID NO: 21 is identified. 

22. The method of claim 6, wherein the methylation state 
of SEQ ID NO: 23 is identified. 

23. The method of claim 6, wherein the methylation state 
of SEQ ID NO: 21 and SEQ ID NO: 23 is identified. 



us 2003/0232351 Al 



24 



Dec. 18, 2003 



24. An isolated polynucleotide comprising about 1638 
nucleotides encoding about 546 amino acids and having 
about 79% amino acid sequence identity to Elongin A2 and 
a Genbank accession number NM 145653. 

25. The polynucleotide of claim 24, wherein nucleotide 
910 is G or A. 

26. The polynucleotide of claim 24, wherein the poly- 
nucleotide is set forth in SEQ ID NO: 1. 

27. The polynucleotide of claim 24, wherein the poly- 
nucleotide encodes a polypeptide as set forth in SEQ ID NO: 
2. 

28. An isolated polynucleotide comprising SEQ ID NO: 1 
and sequences 5' and 3' to SEQ ID NO: 1 containing CpG 
islands and GC rich regions. 

29. A purified polypeptide encoded by the polynucleotide 
of claim 24. 

30. Antibodies that bind to the polypeptide of claim 29. 

31. A method for identifying the presence of an imprinted 
gene comprising: 

comparing the DNA methylation status at a cytosine 
residue of a CpG sequence in a genomic DNA sample 
of maternal origin with the DNA methylation status at 
a cytosine residue of a CpG sequence in a genomic 
DNA sample of paternal origin, wherein a difference in 
DNA methylation status between the two samples is 
indicative of the presence of an imprinted gene. 

32. The method of claim 31, wherein the difference in 
methylation status is in a GC rich region or a CpG island. 

33. A methylation status representation identified by the 
method of claim 31. 

34. A method for identifying the presence of an imprinted 
gene in genomic DNA comprising: 

identifying a population of CpG islands and GC rich 
regions in a genomic DNA sample; 

identifying a candidate gene within about 200 to 2000 
kilobases of a first CpG island or GC rich region in the 

population of CpG islands and GC rich regions; and 

determining whether the candidate gene is regulated by 
methylation of the first CpG island or GC rich region 
and preferentially methylated in paternal DNA or 
maternal DNA, wherein regulation of the candidate 
gene by methylation of the first CpG island or GC rich 
region and paternal or maternal preferential methyla- 
tion is indicative of an imprinted gene, thereby identi- 
fying the presence of an imprinted gene. 

35. An imprinted gene identified by the method of claim 
34. 

36. A method for identffying a CpG island or GC rich 
region-regulated gene, comprising identifying a candidate 
gene within about 200 to 2000 kilobases of a CpG island or 
GC rich region and determining whether the candidate gene 
is regulated by methylation of the CpG island or GC rich 
region, thereby identifying the CpG island or GC rich 
region-regulated gene. 

37. The method of claim 34, wherein the method com- 
prises determining whether the candidate gene is regulated 
by methylation of a CpG island or GC rich region is selected 
from SEQ ID NO: 3-31, with the proviso that it is not SEQ 
ID NO: 8. 

38. The method of claim 34, wherein the candidate gene 
is regulated by methylation of SEQ ID NO: 13, SEQ ID NO: 
14, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ 



ID NO: 20, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 
27, SEQ ID NO: 28, or SEQ ID NO: 31. 

39. The method of claim 34, wherein the method com- 
prises determining whether the candidate gene is regulated 
by methylation of SEQ ID NO: 27. 

40. The method of claim 34, wherein the method com- 
prises determining whether the candidate gene is regulated 
by methylation of SEQ ID NO: 30. 

41. The method of claim 34, wherein the method com- 
prises determining whether the candidate gene is regulated 
by methylation of SEQ ID NO: 24. 

42. The method of claim 34, wherein the method com- 
prises determining whether the candidate gene is regulated 
by methylation of SEQ ID NO: 8. 

43. The method of claim 34, wherein the method com- 
prises determining whether the candidate gene is regulated 
by methylation of SEQ ID NO: 4. 

44. The method of claim 34, wherein the method com- 
prises determining whether the candidate gene is regulated 
by methylation of SEQ ID NO: 26. 

45. The method of claim 34 wherein the method com- 
prises determining whether the candidate gene is regulated 
by methylation of SEQ ID NO: 21. 

46. The method of claim 34, wherein the method com- 
prises determining whether the candidate gene is regulated 
by methylation of SEQ ID NO: 23. 

47. The method of claim 34, wherein the method com- 
prises determining whether the candidate gene is regulated 
by methylation of SEQ ID NO: 21 and SEQ ID NO: 23. 

48. A method for determining the methylation status of a 
population of similarly methylated regions (SMRs) in a 
subject, comprising performing methylation status analysis 
of a population of SMRs of genomic DNA from a human 
sample. 

49. The method of claim 48, wherein the methylation 
status of SMRs is correlated with a disease state. 

50. The method of claim 48, wherein the population of 
SMRs is selected from at least two of SEQ ID NO: 3, SEQ 
ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, 
SEQ ID NO: 9, SEQ ID NO:10, SEQ ID NO: 11, SEQ ID 
NO: 12, SEQ ID NO: 16, SEQ ID NO: 21, SEQ ID NO: 22, 
SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, and SEQ 
ID NO: 30. 

51. The method of claim 48, wherein the population of 
SMRs comprises three SMRs. 

52. The method of claim 49, wherein the disease is 
selected from cancer, multiple sclerosis, Alzheimer's dis- 
ease, Parkinson's disease, depression and other imbalances 
of mental stability, atherosclerosis, cystic fibrosis, diabetes, 
obesity, Crohn's disease, and altered circadian rhythmicity, 
arthritis, inflammatory reactions or disorders, psoriasis and 
other skin diseases, autoimmune diseases, allergies, hyper- 
tension, anxiety disorders, schizophrenia and other psycho- 
ses, osteoporosis, muscular dystrophy, amyotrophic lateral 
sclerosis or circadian rhythm -related conditions. 

53. A method for identifying a population of low copy 
number CpG islands and GC rich regions, comprising: 

cleaving genomic DNA with both a restriction enzyme 
that cleaves at a recognition site comprising adenosine 
and thymidine residues and a restriction endonuclease 
that cleaves at an unmethylated restriction site com- 
prising cytidine and guanosine residues, to generate a 
population of restriction fragments excluding those that 
are methylated; 
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cloning restriction fragments of at least 200 nucleotides in 
length from the population of restriction fragments, in 
a restriction negative bacteria to generate a first library; 

cleaving cloned DNA of the first library with a restriction 
enzyme that cleaves DNA at a restriction site within a 
CpG island or GC rich region; 

excluding CpG island or GC rich region fragments that 
contain repetitive elements while leaving low copy 
CpG island or GC rich region fragments intact, thereby 
producing a population of low copy number CpG 
islands and GC rich regions. 

54. The method of claim 53, further comprising cloning 
the restriction fragments containing low copy CpG islands 
and GC rich regions to form a library containing a plurality 
of low copy CpG island or GC rich region DNA. 

55. The method of claim 53, wherein the excluding is by 
optionally cleaving cloned DNA of the first library with a 
restriction enzyme that cleaves DNA at a restriction site 
within a CpG island or GC rich region repeat sequence or 
using a methylated CpG binding column. 

56. A library produced by the method of claim 53. 



57. The method of claim 53, wherein the GC rich regions 
are CpG islands. 

58. A method for identifying the DNA methylation status 
at a cytosine residue of a CpG sequence in a genomic DNA 
sample from the subject, wherein hypomethylation of CpG 
sequences compared to a methylated DNA control sample is 
indicative of a disease present within the subject. 

59. A method according to claim 58, characterized in that 
a set of CpG positions comprises at least 3 CpG positions 
that are located in the regulatory region of the same gene. 

60. A method according to claim 58 or 59, characterized 
in that the methylation state of at least 3 different sets of CpG 
positions is identified. 

61. The method of any of claims 1, 6, 31, 34, 48 or 53, 
wherein the GC rich region is in an intron. 

62. The method of any of claims 1, 6, 31, 34, 48 or 53, 
wherein the GC rich region is in an exon. 

63. The method of any of claims 1, 6, 31, 34, 48 or 53, 
wherein the GC rich region is in a regulatory region. 
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ABSTRACT 



The present invention related to a method of determining the 
fi*equency of an allele in a population of nucleic acid 
molecules. The method comprises pooling the nucleic acid 
molecules of a population of nucleic acids, performing 
primer extension reactions using a primer which binds at a 
predetermined site located in nucleic acid molecules, and 
obtaining a pattern of nucleotide incorporation. 
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METHOD FOR DETERMINING ALLELE 
FREQUENCIES 

CROSS-REFERENCE TO RELATED 

APPLICATION 5 

This application claims the benefit of U.S. application Ser. 
No. 60/271,703 filed Feb. 27, 2001, the disclosure of which 
is incorporated herein by reference. 

10 

BACKGROUND OF THE INVENTION 

The invention relates to a method of determining the 
frequency of an allele within a given population or group, 
and in particular to a method of determining allele frequen- 15 
cies for single nucleotide polymorphisms (SNPs) or other 
mutations or genetic variations (e.g. nucleotide insertions, 
additions or deletions, gene, chromosome or genome dupli- 
cations (or multiplications) etc. in pooled nucleic acid 
samples or other samples (including single samples) which 20 
may contain allelic variants. 

Individuals in populations will have genetic differences. 
The genetic differences may be represented as the individu- 
als in the population having different alleles at a given locus. 
Alternatively genetic differences can be related to gene, 25 
chromosome, or whole genome duplications (or other mul- 
tiplications). The allele frequency describes the fraction of 
the population exhibiting a particular allele. Over a whole 
population, there may be many different alleles at a particu- 
lar locus. However, where the genetic difference occurs as 30 
alterations of a single nucleotide (single nucleotide poly- 
morphisms or SNPs), generally only 2 alleles are present in 
the population, although triallelic or tetrallelic SNPs are 
known. Studies of allelic association in populations are one 
of the most usefiil and powerful methods for mapping 35 
genes/mutations that contribute to disease. Such studies 
require the determination of the genotype (i.e. which allele 
is present) at one or several loci in a population. The 
frequency of a particular allele in a given population can be 
assessed, and the association of that allele with a disease or 40 
other clinical condition (e.g. predisposition to disease, thera- 
peutic responsibility etc.) can be studied. 

Single nucleotide polymorphisms (SNPs) are regularly 
used for genetic association studies, and consist of single 
nucleotide substitutions. SNPs are normally biallelic mark- 45 
ers (i.e. there are 2 alleles present in the population), and are 
the markers of choice for various types of genetic analysis, 
because of their high frequency in the genome. SNPs are 
found approximately once every 100 to 1000 bases in the 
human genome. An SNP has a prevalence of at least 1% in 50 
a given population. Further, they are stable, having much 
lower mutation rates than repeat sequences, for example. 
The analysis of SNPs is of great importance in several 
disciplines within the applied genomic field. Importantly, the 
nucleotide sequence variations that are most likely to be 55 
responsible for the functional changes of interest will be 
SNPs. Such variations are therefore of great interest, and 
many studies directed to identify functional SNPs contrib- 
uting to (or associated with) a particular trait or disease 
("phenotype") have been performed. Thus many diseases 60 
and conditions may be associated with (or linked to) single 
nucleotide polymorphisms, either alone or in combination. 
For example, in WO 00/22166, it has been suggested that a 
combination of SNPs within several genes gives a polymor- 
phic pattern which may be used to predict the likelihood of 65 
developing cardiovascular disease. Obtaining reliable and 
accurate data on the frequencies of a given SNP allele in a 
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given population without testing each member of the popu- 
lation would have a revolutionary impact on the efficiency 
and cost of analysis for large population studies. 

However, the frequency of other genetic mutations or 
variants, e.g. insertion/addition/deletion mutations and gene, 
cliromosome or genome duplications (in the sense of any 
number of multiplications or repeats), and those studied in 
cancer genetics and chromosomal abnormality (e.g. trisomy) 
cases, can be analysed by the method of the invention. 

Allelic association means that across a given population, 
individuals who have a certain allele at one locus may have 
a statistically higher chance of developing a particular 
disease, for example. Thus, the possession of a particular 
allele can cause direct susceptibility to a disease. Altema- 
tively, the possession of a particular allele may be indirectly 
linked to disease susceptibility via association with the 
"disease" allele. 

Association studies attempt to find genes that influence or 
increase susceptibility to disease or traits in any organism. 
This involves determining the frequency of an allele from a 
population of organisms with that trait or disease and 
comparing the results with a control population that do not 
exhibit the disease or trait. Various statistical/mathematical 
methods are known and described in the art for assessing 
allele frequencies based on such studies. In order to perform 
large-scale association studies for single nucleotide poly- 
morphisms, methods have included labourious and expen- 
sive individual genotyping of individual nucleic acid 
samples. Pooling of nucleic acid samples in order to obtain 
allele frequency information has been used to reduce the 
burden of genotyping individual samples. To date, most 
pooling investigations have centred on the use of microsat- 
ellite polymorphisms, with few methods developed for the 
rapid assessment of SNPs in a given population. 

Studies on allele frequencies tend to rely on radiation- 
based methods, or gel electrophoresis, which have well- 
known drawbacks. A method of determining SNP allele 
frequency using allele-specific fluorescent probes in the 
Taqman® assay (Breen et al., Bioteclmiques 2000, 28(3) 
464-470) has been developed by PE Biosystems. In this 
technique Taqman® probes are used to detect specific 
sequences in Polymerase Chain Reaction (PCR) products by 
employing the 5' 3' exonuclease activity of Taq polymerase. 
The Taqman® probe anneals to the target sequence between 
the traditional forward and reverse PCR primers. The Taq- 
man® probe is labelled with a reporter fluorophore and a 
quencher fluorochrome. This technique relies on the possi- 
bility of designing allele specific probes that match the 
annealing temperature of the PCR primers. Moreover, the 
allele specificity of the probe is, in the case of SNPs, 
determined by one out of 17-30 bases. These restrictions 
make it hard to design allele specific probes showing good 
enough temperature discrimination not to bind to the other 
allele. Hence, the signal from such an assay might not 
always accurately represent the frequency of the probe 
specific allele. A disadvantage of this method may be in 
finding assay conditions where a mismatch results in clearly 
distinguishable difference in cleavage of the reporter fluo- 
rophore on the two alleles. Further, Taqman® probes have 
different dyes at the 5' and 3' ends and are therefore costly 
to produce, and must be carefully designed. Taqman requires 
two reactions in order to measure allele frequency, using a 
different probe in each of the two reactions, complementary 
to either allele. It would therefore be advantageous to 
develop a method of determining SNP allele frequencies in 
pooled nucleic acid in one reaction which was accurate. 
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reliable and that avoided the need for labels or relied on 
probe binding to the SNP site. 

BRIEF SUMMARY OF THE INVENTION 

It has now been found that a simple, reliable, reproducible 
and accurate method for determining the frequency of an 
allele in a given population, may be performed by pooling 
the nucleic acid sequences of the said population and 
performing a "primer-extension" type reaction, using prim- 
ers designed for particular SNPs/alleles, and detecting the 
pattern of incorporation of nucleotides in said "primer- 
extension" reaction. The pattern may then be analysed to 
determine the frequency of each allele in the pooled nucleic 
acid. 

The method is particularly suited to automation e.g. in 
systems where reaction and reagent dispensing steps take 
place in a microtitre plate format. The methods are particu- 
larly suited for finding SNP markers that are correlated to a 

certain trait, for example a specific disease, but may also find 
application in other allele frequency applications, such as 
SNP confirmation or analysis of mutations associated with 
cancer or chromosome abnormalities, especially abnormali- 
ties of chromosome number, and other mutations or varia- 
tions involving duplication or loss of chromosomes or 
genes. 

As described fiirther below the present invention is advan- 
tageously based on a method of "sequencing-by-synthesis" 
(see e.g. U.S. Pat. No. 4,863,849 of Melamede). This is a 
term used in the art to define sequencing methods which rely 
on the detection of nucleotide incorporation during a primer- 
directed polymerase extension reaction. The four different 
nucleotides (i.e. A, G, T or C nucleotides) are added cycli- 
cally or sequentially (conveniently in known order), and the 
event of incorporation can be detected directly or indirectly. 
This detection reveals which nucleotide has been incorpo- 
rated, and hence sequence information, when the nucleotide 
(base) which forms a pair (according to the normal rules of 
base pairing, A-T and C-G) with the next base in the 
template sequence is added, it will be incorporated into the 
growing complementary strand (i.e. the extended primer) by 
the polymerase, and this incorporation will trigger a detect- 
able signal, the nature of which depends upon the detection 
strategy selected. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. la depicts the expected allele frequency (SNP 470R) 
and calculated allele frequency determined (estimated) via 
Pyro sequencing™. The results are plotted as estimated allele 
frequency versus expected allele frequency. Pool 1 has been 
calibrated according to Example 3, whereas the DNA con- 
centration in pool 2 has been assayed via absorbance of Hght 
at 260 nm. 

FIG. lb depicts the expected allele frequency (SNP 46 IR) 
and calculated allele frequency determined (estimated) via 
Pyro sequencing™. The results are plotted as estimated allele 
frequency versus expected allele frequency. Pool 1 has been 
calibrated according to Example 3, whereas the DNA con- 
centration in pool 2 has been assayed via absorbance of light 
at 260 nm. It should be noted that SNP 461R consistently 
gives a peak that is 3% too high, and the results shown are 
consistent with this. 

FIG. 2a depicts the calculated allele frequency results of 

4 pools of PGR products determined via Pyro sequencing™. 

5 replicate reactions were performed on each pool. The 
results are plotted as estimated allele firequency versus 
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expected allele frequency, both in percentage (%). The pools 
contained 27% G, 15% G, 10% G and 5% G. The calculated 
allele frequency value (shown as diamonds) are in close 
correlation to the expected values (shown as squares). 

5 FIG. 2b depicts the calculated allele frequency results of 
4 pools of genomic DNA samples determined via Pyro se- 
quencing™. 5 replicate reactions were performed on each 
pool. The results are plotted as estimated allele frequency 
versus expected allele frequency, both in percentage (%). 

10 The pools contained 27% G, 1 5% G, 10% G and 5% G. The 
calculated allele frequency value (shown as diamonds) are in 
close correlation to the expected values (shown as squares). 

FIG. 3a shows DNA sequencing on pooled genomic DNA 
over SNP 470R, the expected sequence of which is T[C/A] 

15 TCTGG. 40 ^1 PGR product was incubated with 15 ^1 
magnetic beads (10 |xg/|jl) and 25 |xl 2xBW buffer. Pyrose- 
quencing™ was then performed on a PSQ™ 96 system 
instrument using Pyrosequencing™ SNP reagent kit. The 
peak heights were measured in order to calculate the fre- 

20 quency of the allele. The results are shown generally as 
nucleotide incorporated (i.e. A, C, G or T) versus amount of 
light released (in RLU). The 2 nucleotide incorporations 
which relate to the SNP are marked. The experimental 
conditions are as described in Example 4. 

25 FIG. 3b shows DNA sequencing on pooled genomic DNA 
over SNP EU4, the expected sequence of which is [A/G] 
CTGCCT. 40 |il PGR product was incubated with 15 |il 
magnetic beads (10 |ig/|il) and 25 |il 2xBW buffer. Pyrose- 
quencing™ was then perfonned on a PSQ™ 96 system 

30 instrument using Pyrosequencing™ SNP reagent kit. The 
peak heights were measured in order to calculate the fre- 
quency of the allele. The results are shown generally as 
nucleotide incorporated (i.e. A, C, G or T) versus amount of 
light released (in RLU). The 2 nucleotide incorporations 

35 which relate to the SNP are marked. The experimental 
conditions are as described in Example 4. 

FIG. 3c shows DNA sequencing on pooled genomic 
DNA, over SNP 46 6F, the sequence of the nucleic acid 
should be [C/T/G]A.4GGTTGTCCT (SEQ. ID NO: 1) 40 |li1 

40 PGR product was incubated with 15 |li1 magnetic beads 
(10jig/|il) and 25 \i\ 2xBW buffer. Pyrosequencing™ was 
then performed on a PSQ™ 96 system instrument using 
Pyrosequencing™ SNP reagent kit. The peak heights were 
measured in order to calculate the frequency of the allele. 

45 The results are shown generally as nucleotide incorporated 
(i.e. A, C, G or T) versus amount of light released (in RLU). 
The 3 nucleotide incorporations which relate to the SNP are 
marked. The experimental conditions are as described in 
Example 4. 

50 FIG. 3d shows DNA sequencing on pooled genomic 
DNA, over SNP 465R, the sequence of the nucleic acid 
should be [CIT]GTTCCACCT (SEQ. ID NO: 2). 40 ^il PGR 
product was incubated with 15 |xl magnetic beads (10 |xg/|jl) 
and 25 |j1 2xBW buffer. Pyrosequencing'^^ was then per- 

55 formed on a PSQ™ 96 system instrument using Pyrose- 
quencing™ SNP reagent kit. The peak heights were mea- 
sured in order to calculate the frequency of the allele. The 
results are shown generally as nucleotide incorporated (i.e. 
A, C, G or T) versus amount of light released (in RLU). The 

60 2 nucleotide incorporations which relate to the SNP are 
marked. The experimental conditions are as described in 
Example 4. 

FIG. 3e shows DNA sequencing on pooled genomic 
DNA, over SNP 46 IR, the sequence of the nucleic acid 
65 should be [C/T]TGCAGA. 40 |il PGR product was incu- 
bated with 15 |xl magnetic beads (10 |xg/|xl) and 25 |xl 2xBW 
buffer. Pyrosequencing™ was then performed on a PSQ™ 
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96 system instrument using Pyro sequencing™ SNP reagent 
kit. The peak heights were measured in order to calculate the 
frequency of the allele. The results are shown generally as 
nucleotide incorporated (i.e. A, C, G or T) versus amount of 
light released (in RLU). The 2 nucleotide incorporations 
which relate to the SNP are marked. The experimental 
conditions are as described in Example 4. 

FIG. 4a depicts graphically relative peak heights from a 
Pyro sequencing reaction plotted against allele frequency. 
The SNP analysed was SNPEl. 5 pmol pooled DNA PGR 
product was incubated with 17.5 magnetic beads, and 
Pyro sequencing™ was performed using the primer as shown 
in Example 1 . The resulting peak heights were plotted versus 
expected allele frequency, and a linear relationship between 
the 2 was demonstrated. The experimental conditions are as 
set out in Example 5. 

FIG. 4b depicts graphically relative peak heights from a 
Pyro sequencing reaction plotted against allele frequency. 
The SNP analysed was SNPE7. 5 pmol pooled DNA PGR 
product was incubated with 17.5 |li1 magnetic beads, and 
Pyrosequencing'^^ was performed using the primer as shown 
in Example 1 . The resulting peak heights were plotted versus 
expected allele frequency, and a linear relationship between 
the 2 was demonstrated. The experimental conditions are as 
set out in Example 5. 

FIG. 4c depicts graphically relative peak heights from a 
Pyro sequencing reaction plotted against allele frequency. 
The SNP analysed was SNPE4. 5 pmol pooled DNA PGR 
product was incubated with 17.5 \i\ magnetic beads, and 
Pyro sequencing™ was performed using the primer as shown 
in Example 1 . The resulting peak heights were plotted versus 
expected allele frequency, and a linear relationship between 
the 2 was demonstrated. The experimental conditions are as 
set out in Example 5. 

FIG. 5 is a ftirther representation of FIG. 4b. Also 
depicted on this figure are the Pyrogram™ plots showing 
25% C, 50% C and 75% G peaks, which are correlated to 
points on the linear plot. Experimental conditions are 
described in Example 5. 

FIG. 6 depicts the obtained allele frequency results from 
Pyrosequencing'T'^ for SNP lOOOF and the expected allele 
frequency for the sample. The results are plotted as obtained 
allele frequency (%) versus expected allele frequencies (%). 
The standard line shows an imaginary pattern for an "ideal" 
SNP. 30 jxl of PGR product was used for Pyrosequencing™, 
as described in Example 5. 

FIG. 7 depicts the obtained allele frequency results from 
Pyro sequencing™ for SNP 345F and the expected allele 
frequency for the sample. The results are plotted as obtained 
allele frequency (%) versus expected allele frequencies (%). 
The standard line shows an imaginary pattern for an "ideal" 
SNP. 30 |xl of PGR product was used for Pyrosequencing™, 
as described in Example 5. Two pools were made, with 
expected allele frequencies of 10% A and 26% A. 

FIG. Sa shows DNA sequencing on pooled genomic DNA 
over SNP 345F (A/GGGG). 30 ^1 of PGR product was 
incubated with 10 \i\ magnetic beads and 20 \i\ of 2xBW 
buffer. Pyrosequencing™ was then performed on a 
PSQ™96 system instrument using Pyrosequencing™ SNP 
reagent kit. The resultant emitted light caused by nucleotide 
incorporation was measured and plotted as nucleotide incor- 
poration V light emitted (RLU). For this experiment the 
addition of the nucleotides was such that the SNP was 
represented in 3 consecutive peaks (marked). The experi- 
mental conditions are as described in Example 5. 

FIG. Sb shows DNA sequencing on pooled genomic DNA 
over SNP 345F (A/GGGG). 30 pi of PGR product was 
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incubated with 10 magnetic beads and 20 jil of 2xBW 
buffer. Pyrosequencing™ was then performed on a 
PSQ™96 system instrument using Pyrosequencing™ SNP 
reagent kit. The resultant emitted light caused by nucleotide 

5 incorporation was measured and plotted as nucleotide incor- 
poration V light emitted (RLU). For this experiment the 
addition of the nucleotides was such that the SNP was 
represented in only 2 consecutive peaks (marked). The 
experimental conditions are as described in Example 5. 

10 FIG. 9 depicts the obtained mean allele frequency results 
from Pyrosequencing™ for SNP 47 IF and the expected 
allele frequency for the sample. The results are plotted as 
mean allele frequency (calculated from 10 replicates) (%) 
versus expected allele frequencies (%). The standard line 

15 shows an imaginary pattern for an "ideal" SNP. 30 ^1 of PGR 
product was used for Pyrosequencing'^'^, as described in 
Example 5. Four pools were collated, with expected allele 
frequencies of 68.7%, 78.6%, 91.7% and 95.5% G. 

FIG. 10a depicts the allele frequency obtained via Pyrose- 

20 quencing™ compared to the expected allele frequency for 
that pool, in percentage. 3 artificial oligonucleotides were 
investigated, and the results for all 3 oligonucleotides are 
depicted. The plot is obtained allele frequency vs expected 
allele frequency. The oligonucleotides were used at a con- 

25 centration of 1 pmol/|xl, and Pyrosequencing was performed 
as described in Example 5. The mean frequency was calcu- 
lated from 10 replicate experiments. 

FIG. 10b depicts the results obtained for oligo 1, as shown 
on FIG. 10a. 

30 FIG. 10c depicts the results obtained for oligo 2, as shown 
on FIG. 10a. 

FIG. lOd depicts the results obtained for oligo 3, as shown 
on FIG. 10a. 

FIG. 11a represents graphically estimated allele fre- 
35 quency for the G allele of SNP 465R versus template amount 

in the PGR reaction, the allele frequency was determined via 
Pyrosequencing™. 4 pools with the same allele frequency 
were set up using long, 10 ng, 1 ng, 0.1 ng and 0.05 ng of 
genomic DNA prior to PGR. The experimental conditions 

40 are as described in Example 6. The expected frequency of 
the G allele for each of the 4 pools was 31%. 

FIG. lib represents grapliically estimated allele fre- 
quency for the G allele of SNP 465R versus template amount 
in the PGR reaction, the allele frequency was determined via 

45 Pyrosequencing™. 4 pools with the same allele frequency 
were set up using long, 10 ng, 1 ng, 0.1 ng and 0.05 ng of 
genomic DNA prior to PGR. The experimental conditions 
are as described in Example 6. The expected frequency of 
the G allele for each of the 4 pools was 12.5%. 

50 FIG. 11c represents graphically estimated allele frequency 
for the G allele of SNP 46 5 R versus template amount in the 
PGR reaction, the allele frequency was determined via 
Pyrosequencing'^**. 4 pools with the same allele frequency 
were set up using 10 ng, 1 ng, 0. 1 ng and 0.05 ng of genomic 

55 DNA prior to PGR. The experimental conditions are as 
described in Example 6. The expected frequency of the C 
allele for each of the 4 pools was 19%. 

FIG. lid represents graphically estimated allele fre- 
quency for the G allele of SNP 46 5 R versus template amount 

60 in the PGR reaction, the allele frequency was determined via 
Pyrosequencing™. 4 pools with the same allele frequency 
were set up using long, 10 ng, 1 ng, 0.1 ng and 0.05 ng of 
genomic DNA prior to PGR. The experimental conditions 
are as described in Example 6. The expected frequency of 

65 the C allele for each of the 4 pools was 6%. 

FIG. 12 represents graphically estimated allele frequency 
obtained via Pyrosequencing™ versus peak height obtained 
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via Pyrosequencing™. 4 dilferent SNPs were investigated — 
481R, 486R, 460R and 470R. The expected allele frequen- 
cies were as follows: 470R— 55% A, 481R— 19.5% G, 
486R— 12.5% C and 460R, 6% G. Pyro sequencing™ was 
performed on 5 different amounts of PGR product of pooled 5 
DNA: 30 20 15 |xl, 10 jxl and 5 |il. The experimental 
conditions are as described in Example 6. 

DETAILED DESCRIPTION OF THE 

INVENTION 10 

Accordingly, the present invention provides a method of 
determining the frequency of an allele in a population of 
nucleic acid molecules, said method comprising: 

pooling the nucleic acid molecules of said population, 15 
performing primer extension reactions using a primer which 
binds at a predetermined site located in said nucleic acid 
molecules, and obtaining a pattern of nucleotide incorpora- 
tion. 

Further, the present invention provides a method of deter- 20 
mining the amount of an allele in a sample of nucleic acid 
molecules, said method comprising: 

performing primer extension reactions on said nucleic 
acid molecules, using a primer which binds at a predeter- 
mined site located in at least one said molecule, and deter- 25 
mining which and/or how many nucleotides are incorporated 
in said reaction, and analysing said nucleotide incorporation 
information thus obtained in order to determine the amount 
of occurrence of said allele in said sample. 

The nucleic acid molecules mentioned in the allele quan- 30 
tification method above may be obtained from one indi- 
vidual, i.e. an individual who is suspected to have additional 
genes, chromosomes or genomes present (i.e. trisomy or 
duplication of chromosomes). The nucleic acid molecules of 
the sample thus contain, or are suspected to contain, 3 or 35 
more alleles (e.g. 3, 4, 5 alleles). The method of the 
invention thus quantifies the number of alleles present (and 
hence the number of nucleic acid molecules which contain 
them), thus allowing diagnosis of gene, chromosome or 
whole genome duplications (or other multiplications). Thus, 40 
for example, an individual with a particular trisomy will 
contain 3 copies of chromosomes instead of 2. Accordingly 
a sample from that individual will contain 3 nucleic acid 
molecules corresponding to, or deriving from that chromo- 
some, rather than two. By quantifying the amount of an 45 
allele present in that molecule, the amount of the molecule, 
and hence the chromosome number may be determined. In 
analogous fashion other duplications (i.e. replications or 
multiplications or indeed loss of chromosomes (e.g. chro- 
mosome number abnormalities), genes, genomes or other 50 
nucleotide sequences) may be determined. In this method an 
allelic variant or a particular allele may be used as a marker 
of a particular gene or chromosome or gene or other genetic 
(i.e. nucleotide) sequence it is desired to quantify. 

Primer extension reactions are thus performed using the 55 
nucleic acid molecules in the pool or sample as templates. 
The primer, which is designed or selected to bind at a 
particular site in the template (e.g. adjacent, or upstream or 
downstream of, e.g. near to a test SNP of interest) is simply 
added to the sample (e.g. pooled sample for allele frequency 60 
determination) and will bind to the different template mol- 
ecules present. Primer extension reactions (e.g. performed 
using polymerase and added nucleotides) are thus performed 
simultaneously or substantially simultaneously. By detect- 
ing the incorporation or non- incorporation of a given added 65 
nucleotide, a "pattern" of nucleotide incorporation may be 
determined which may be used to provide data which is 
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informative on the nature of the alleles in question, and on 
their frequency, or occurrence (e.g. presence or absence) in 
the tested population. Thus, data, which may be quantitative 
and/or qualitative, may be obtained which may be correlated 
to (or which may provide information relating to) the 
frequency of an SNP allele (i.e. the "test" or "target" SNP or 
"test" or "target" allele) in the tested population. In other 
words, the method of the invention may be used to obtain 
quantitative and/or qualitative data on nucleotide incorpo- 
ration relating to the SNP or allelic variant of interest. 

As will be described further below, the nucleotide incor- 
poration may be detected in various ways, and diflerent 
ways of performing the primer extension reaction are pos- 
sible. For example, the different nucleotides (i.e. having the 
different bases (e.g. A, T, C or G) may be added together, in 
a form in which they are distinguishable from one another 
(e.g. by being provided with distinguishable detectable 
moieties e.g. labels). More preferably however, different 
nucleotides may be added individually, e.g. in turn (i.e. 
sequentially) and the incorporation or non-incorporation of 
each nucleotide determined. As will be described further 
below, depending on the detection system selected, and/or 
on the target allele/SNP under test, it may not be necessary 
to add or use all four nucleotides (i.e. all of A, T, C or G), 
but a desired selection thereof. 

The term "allele frequency" as used herein refers to the 
level or occurrence, or more particularly, the percentage of 
a particular allele in a given population. An allele is one of 
several alternative forms of a gene or nucleotide sequence at 
a specific chromosomal location. An allele can be any 
genetic variation at a given position within the nucleic acid 
sample. As explained above, an allele may be represented by 
one or more base changes at a given locus (e.g. an SNP). At 
each autosomal locus a diploid individual possesses 2 alle- 
les, one maternally inherited, the other paternally. Particu- 
larly, the allele frequency determination method of the 
invention includes methods for determining SNP or other 
allelic variant allele frequencies. Each diploid individual 
possesses 2 alleles at a given locus. If both of the alleles are 
identical, the individual is homozygous for that locus. If the 
alleles are different, the individual is heterozygous for that 
locus. In the method of the invention, the frequency of each 
allele in the population is determined, but data on the 
genotype (i.e. whether the individual is homozygous for a 
particular allele) of a particular individual in the population 
will not be determined by this method. 

Where allele frequency determination (i.e. allele quanti- 
fication) is performed on a single sample (e.g. a sample from 
a single individual, for example with suspected cliromosome 
number abnormality (e.g. trisomy) no pooling is needed. 

The term "biallelic marker" as used herein refers to a 
genetic marker which only occurs in two forms in the 
population. SNPs are normally biallelic markers, although 
some triallelic or tetra-allelic SNPs are known and therefore 
the method of the invention will determine the frequency of 
each of the two or three or four possible alleles ("allelic 
variants") in a given population. 

The term "population" as used herein refers to a collection 
of individuals, or a group. For example, the individual could 
be a cell, in which case the population would be a collection 
of cells from one or more entities or from different sites of 
a multi-cellular organism, or indeed cells at different stages 
(e.g. life stages of an organism or at different stages of the 
cell cycle) or a population of cells of a unicellular organism 
(e.g. a prokaryote). Alternatively, the individual may be a 
cell component, i.e. mitochondria. Further, the population 
may comprise individuals of the same species (i.e. humans. 
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domestic aiiimals, livestock animals, plants etc.) who may or regions wliich are common to, or substantially conserved 

may not inhabit the same areas, region or comitry. The between the different individuals in the population. Tliis may 

population may be selected on the basis of nationality, ethnic readily be achieved by selecting the primer binding site to lie 

background, disease status, or on the basis of any other in conserved/semi-conserved regions as discussed above, 

classification. Further, the population may be selected on the 5 It will therefore be understood that in the pooled nucleic 

basis of disease susceptibility (i.e. at risk of developing acid, there will generally be 2 "allelic variants" present for 

cardiovascular disease) or on the basis of lack of suscepti- each SNP. Thus, at a given polymorphic position, the 

bility to disease. Familial populations (i.e. all living mem- nucleotide may be either one or two possible bases. In the 

bers of one family group or sub-division of a family, e.g. case of triallelic SNP, there will be one of 3 possible bases, 

particular sibling groups) may be used. A "population" may lo In the case of tetra-allelic SNPs there will be one or two of 

also comprise a sample of a particular cell type or tissue four possible bases. 

from different individuals e.g. a tumour, or particular organ Preferably, the polymorphic position is not sequenced 

etc. Thus, a population may comprise nucleic acid molecules within a homopolymeric stretch in either allelic variant. As 

derived from a particular tissue type or diseased tissue from used herein a homopolymeric stretch is defined as a stretch 

a number of different individuals having or exhibiting that 15 of nucleic acid which contains two or more (i.e. 3 or more, 

tissue or cell type, or tumour etc. The "population" as 4 or more or 5 or more) consecutive identical nucleotides 

defined herein may comprise any number of individuals, (i.e. GC^^^T). However, primers can be designed to avoid 

from 2 or more, to several thousand (i.e. 2 to 10,000, 2 to sequencing the homopolymeric stretch whilst obtaining data 

8,000, 2 to 5,000). on the allele frequency. Therefore, with well designed prim- 

For the analysis of gene, chromosome or genome number 20 ers, estimating allele frequencies of alleles present in 

(i.e. quantification or multiplication detection), the indi- homopolymeric stretches is within the scope of the inven- 

vidual is defined as "the population". The sample from an tion. It is possible to design the primer in order to avoid 

individual may contain a variant amount or number of a sequencing the repeated bases. The extension primer can 

given (e.g. target) nucleic acid molecule. This allele quan- thus be designed to cover the homopolymeric region, 

tification can be performed on single samples which may 25 Further, by the use of appropriate controls or conditions, 

contain a variable number or amount of a target nucleic acid and depending on the detection method chosen, it is possible 

molecule (target allele). to determine the frequency of an allele if the SNP is in a 

The term "pooled nucleic acid molecules" as used herein homopolymeric stretch, 

refers to the pooling of nucleic acid molecules into one The primer extension reactions conveniently may be 

reaction mixture from all individuals of a given population 30 performed by sequentially adding nucleotides to the reaction 

(i.e. the adding together of the different or individual nucleic mixture (i.e. polymerase and primer/template mixture), 

acid samples to create a pooled sample). Therefore, multiple Advantageously, the different nucleotides are added in 

individual nucleic acid molecules are pooled prior to genetic known predetermined order. As each nucleotide is added, it 

analysis. Pooling of nucleic acid molecules is sample size may be determined whether or not nucleotide incorporation 

independent, i.e. independent of the number of samples 35 takes place. 

comprising the pool. Advantageously, as described in more detail below, the 

"Multiple" as used herein means two or more e.g. 3, 4, 5, amount of nucleotide incorporated (i.e. how many nucle- 

6, 8, 10 or more, or 100, 200, 500, 1000, 2000, 5000 or otide residues) may be determined. Such a quantitative 

10000 or more. embodiment, wherein nucleotide incorporation is deter- 
Conveniently, the nucleic acid molecule may be DNA, 40 mined quantitatively, represents a preferred aspect of the 

although detemiining the allele frequency of RNA (e.g. invention. 

mRNA) is also within the invention. If it is desired to use a In this manner, sequencing data may be obtained for the 

RNA sample, the method may additionally include the step polymorphic position in all nucleic acid molecules in the 

of generating cDNA from the RNA template, conveniently pooled samples. This sequencing data comprises the base 

by using reverse transcriptase. Alternatively, if desired, the 45 identity (i.e. sequence) of the particular SNP residue, 

primer extension reactions may be performed directly on together with quantitative data on how many nucleotides of 

RNA templates. each type have been incorporated. In other words, the data 

The target nucleic acid may thus be any nucleic acid, corresponds to the allele frequency for the given SNP. The 

isolated or synthetic, in any desired or convenient form. It allele frequency may thus readily be calculated using the 

may thus be genomic DNA, or isolated mRNA which may 50 quantitative values obtained for nucleotide incorporation 

be used directly for analysis by the method of the invention, during primer extension wherein the primer is extended over 

or it may be a nucleic acid product derived therefrom (or the polymorphic position. 

corresponding thereto), e.g. by synthesis, such as cDNA as Thus, by identifying how much of each nucleotide is 

mentioned above, or an amplification product (e.g. PGR incorporated at the polymorphic site in a primer extension 

amplicon), clones or library products etc. 55 reaction, it is possible to calculate the frequency of each 

In carrying out the method of the invention, a primer allele, 

specific for the allele of interest is provided which binds to In order to perform the invention, it may be advantageous 

the nucleic acid molecules at a predetermined site. The or convenient first to amplify the nucleic acid molecule by 

primer is designed or selected so that when the primer any suitable amplification method known in the art. The 

extension reaction is performed, the primer is extended over 60 target nucleic acid would then be an amplicon. Suitable in 

the allele (or SNP) in the nucleic acid. In other words, the vitro amplification techniques include any process which 

primer binds to the nucleic acid molecule at, or near to (e.g. amplifies the nucleic acid present in the reaction under the 

within 1 to 20, 1 to 10 or 1 to 6 bases), the allele/SNP. direction of appropriate primers. The amplicon method may 

It will be understood that in order to perform the invention thus preferably be PGR, or any of the various modifications 

the primer binding site should be available in all individual 65 thereof e.g. the use of nested primers, although it is not 

nucleic acid molecules in the pooled population. Such limited to this method. Those skilled in the art will appre- 

primer binding sites will therefore advantageously lie in ciate that other amplification procedures may also be used. 
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such as Self-sustained Sequence Replication (3SR), 
NASBA, the Q-beta replicase amplification system and 
Ligase chain reaction (LCR) (see for example Abramson and 
Myers (1993) Current Opinion in Biotech., 4: 41-47). If 
PCR is used to amplify the nucleic acid, suitable primers, are 
designed to ensure that the region of interest within the 
nucleic acid sequence (i.e. the region containing the SNP), 
is amplified. PCR can also be used for indiscriminate 
amplification of all nucleic acid sequences, allowing ampli- 
fication of essentially all sequences within the sample for 
study (i.e. total nucleic acid). Linker-primer PCR is particu- 
larly suitable for indiscriminate amplification, and uses 
double stranded oligonucleotide linkers with a suitable over- 
hanging end, which are ligated to the ends of target nucleic 
acid fragments. Amplification is then conducted using oli- 
gonucleotide primers which are specific for the linker 
sequences. Alternatively, completely random oligonucle- 
otide primers may be used in conjunction with DOP-PCR 
(degenerate oligonucleotide-primed) to amplify all the 
nucleic acid within a sample. 

One or more of the amplification primers used in the 
amplification reaction, may be subsequently used as an 
"extension primer", but this will preferably be a different 
primer. It will be appreciated that the sequence and length of 
the oligonucleotide amplification and extension primers to 
be used in the amplification and extension steps, respec- 
tively, will depend on the sequence of the target nucleic acid, 
the desired length of amplification or extension product, the 
further functions of the primer (i.e. for inunobilization) and 
the method used for amplification and/or extension. Appro- 
priate primers may readily be designed applying principles 
and techniques well known in the art. 

Advantageously, as mentioned above, an extension 
primer will bind substantially adjacent (e.g. within 1-20, 
1-10 or 1-6, preferably within 1-3 bases), or exactly 
adjacent to the SNP of the target nucleic acid molecules and 
may be complementary to a conserved or semi-conserved 
region of the nucleic acid molecules. In order for the method 
of the invention to be performed, knowledge of the sequence 
surrounding the allele (e.g. of the conserved or semi-con- 
served region) is required in order to design an appropriate 
complementary extension primer. The specificity is achieved 
by virtue of complementary base pairing. For all embodi- 
ments of the invention, primer design may be based upon 
principles well known in the art. It is not necessary for the 
extension or amplification primer to have absolute comple- 
mentarity to the binding site, but this is preferred to improve 
the specificity of binding. 

The extension primer may be designed to bind to the sense 
or anti -sense strand of the target nucleic acid. 

The "primer extension" reaction according to the inven- 
tion includes all forms of template-directed polymerase- 
catalysed nucleic acid synthesis reactions. Conditions and 
reagents for primer extension reactions are well known in 
the art, and any of the standard methods, reagents and 
enzymes etc. may be used in this step (see e.g. Sambrook et 
al., (eds). Molecular Cloning: a laboratory manual (1989), 
Cold Spring Harbor Laboratory Press). Thus, the primer 
extension reaction at its most basic, is carried out in the 
presence of primer, deoxynucleotides (dNTPs) and a suit- 
able polymerase enzyme e.g. T7 polymerase, Klenow or 
Sequenase Ver 2.0 (USB USA), or indeed any suitable 
available polymerase enzyme. As mentioned above, for an 
RNA template, reverse transcriptase may be used. Condi- 
tions may be selected according to choice, having regard to 
procedures well known in the art. 
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The primer is thus subjected to a primer-extension reac- 
tion in the presence of a nucleotide, whereby the nucleotide 
is only incorporated if it is complementary to the base 
immediately adjacent (3') to the primer position. The nucle- 

5 otide may be any nucleotide capable of incorporation by a 
polymerase enzyme into a nucleic acid chain or molecule. 
Thus, for example, the nucleotide may be a deoxynucleotide 
(dNTP, deoxynucleoside triphosphate) or di deoxynucleotide 
(ddNTP, dideoxynucleoside triphosphate). Thus, the follow- 

10 ing nucleotides may be used in the primer-extension reac- 
tion: guanine (G), cytosine (C), thymine (T) or adenine (A) 
deoxy- or dideoxy -nucleotides. Therefore, the nucleotide 
may be dOTP (deoxy guano sine triphosphate), dCTP (deoxy- 
cytidine triphosphate), dTTP (deoxy thymidine triphosphate) 

15 or dATP (deoxyadenosine triphosphate). As discussed fur- 
ther below, suitable analogues of DATP, and also for dCTP, 
dGTP and dTTP may also be used. Thus, modified nucle- 
otides, or nucleotide derivatives (e.g. chemically modified 
nucleotides) may be used so long as they are capable of 

20 incorporation by a polymerase enzyme. Dideoxynucleotides 
may also be used in the primer-extension reaction. The term 
"dideoxynucleotide" as used herein includes all 2'-deoxy- 
nucleotides in which the 3' hydroxyl group is modified or 
absent. Dideoxynucleotides are capable of incorporation 

2 5 into the primer in the presence of the polymerase, but cannot 
enter into a subsequent polymerisation reaction, and thus 
fimction as a "chain terminator". It will therefore be appre- 
ciated that in embodiments of the invention which rely on 
sequential nucleotide addition the use of chain terminating 

30 nucleotides is to be avoided (although so-called "false" or 
"labile" terminators might be used in which the 3' blocking 
group may be removed following incorporation. Such modi- 
fied nucleotides are known and described in the art). How- 
ever, in some embodiments of the invention it may be 

35 advantageous to use chain terminating nucleotides whereby 
it is desired to terminate sequencing of one allele after 
incorporation of the chain terminating nucleotide, but more 
sequence information is required for the other allele. 
If the nucleotide is complementary to the target base, the 

40 primer is extended by one nucleotide, and inorganic pyro- 
phosphate is released. As discussed further below, in a 
preferred method, the inorganic pyrophosphate may be 
detected in order to detect the incorporation of the added 
nucleotide. For the SNP of interest, the addition of two 

45 nucleotides will be sufficient to generate allele fi*equency 
information. The primer bound to one allelic variant will be 
extended by 1 nucleotide upon addition of the nucleotide 
which base pairs to the nucleotide in the polymorphic 
position. The primer bound to the other allelic variant will 

50 therefore not be extended by addition of this nucleotide. This 
primer will be extended in the next round of nucleotide 
addition, which should be designed to be a complementary 
base to the alleHc variant (i.e. if the allelic variant is C, a G 
should be added). Different nucleotides may be added 

55 sequentially, advantageously in known order, as discussed 
above, to reveal the nucleotides which are incorporated for 
each extension primer. Accordingly, determining the number 
of nucleotides incorporated for each nucleotide addition, 
will reveal the frequency of that allele corresponding to 

60 nucleotide incorporation and hence contribute to the calcu- 
lation of allele fi-equency. 

Hence, a primer extension protocol may involve anneal- 
ing a primer as described above, adding a nucleotide, 
performing a polymerase-catalysed primer extension reac- 

65 tion, detecting the presence or absence of incorporation of 
said nucleotide (and advantageously also determining the 
amount of each nucleotide incorporated) and repeating the 
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nucleotide addition and primer extension steps etc. one or Conveniently, the nucleoti de-degrading enzyme may sim- 

more times. As discussed above, single (i.e. individual) ply be included in the reaction mixture for the polymerase 

nucleotides may be added successively to the same primer- reaction, which may be initiated by the addition of the 

template mixture. nucleotide. 

In order to permit the repeated or successive (iterative) ^ According to the present invention, detection of nucle- 

11. . n 1 .1 . . • J +1, otide incorporation can be performed in a number 01 ways, 

addition oi nucleotides in a pnmer-extension procedure, the , . \ . . n i . . i ^ . - ' 

1 .1 jT^i- 1 such as by incorporation 01 labelled nucleotides which may 

previously-added nucleotide must be removed. This may be . \^^■t^^^ i 11111 1 i - 1 

^ , . , , , . .11 . subsequently be detected, or by using labelled probes which 

achieved by washing, or more conveniently, by using a ^-^^^ ^j^^ ^^^^^^^^^ sequence. 

nucleobde-degrading enzyme, for example as descnbed m ^j^^ ^^^^^^ ^^^^ perfonned using a Sanger sequencing 

detail in WO98/28440. method combined with a standard detection strategy, e.g. 

Accordingly, in a principal embodiment of the present electrophoresis or mass spectrometry to analyse, or deter- 

invention, a nucleotide degrading enzyme is used to degrade mine, nucleotide incorporation. However, it is preferred to 

any unincorporated or excess nucleotide. Thus, if a nucle- use a sequencing-by-synthesis method, due to the fact that 
otide is added which is not incorporated (because it is not 15 the extension reactions are quantitative, i.e. that the nucle- 

complementary to the taiget base), or any added nucleotide otide incorporation may be determined quantitatively. As 

remains after an incorporation event (i.e. excess nucleotides) mentioned above, sequencing-by-synthesis methods are dis- 

then such unincorporated nucleotides may readily be closed extensively in U.S. Pat. No. 4,863,849, which dis- 

removed by using a nucleotide-degrading enzyme. This is closes a number of ways in which nucleotide incorporation 
described in detail in WO98/28440. 20 may be determined or detected, e.g. spectropliotometrically 

Tlie term "nucleotide degrading enzyme" as used herein J ''^ fluorescent detection techniques, for example by 

includes any enzyme capable of specifically or non-specifi- determining the amount of nucleotide remaimng m the 

cally degrading nucleotides, including at least nucleoside ^^^^ed nucleotide feedstock, following the nucleotide mcor- 

triphosphates (NTPs), but optionally also di- and mono- ^^ep- Alternatively, labelled nucleotides may be 
phosphates, and any mixture or combination of such ^5 utilised in the nucleotide mcorporation step. Such labelled 

enzymes, provided that a nucleoside triphosphatase or other nucleotides may be chain termmatmg or capable of furftier 

NTP-degrading activity is present. Where a chain terminat- extension. The nucleotide mcorporated may be identified 

ing nucleotide is used (e.g. a dideoxy nucleotide is used), the """"^ l""^^ °' i^eutralised pnor to the mcorpo- 

nucleotide degrading enzyme should also degrade such a . t .TI r.'n..^ 

nucleotide. Although nucleotide-degrading en^mes having descnbed m U.S. Pat. No. 6,087,095 of Rosenthal et al. TMs 

a phosphatase activity may conveniently be used according patent also descnbes sequencmg-by-synthesis on a solid 

to the invention, any enzyme having any nucleotide or P^^f.^^^ (^.g- ^^^"^f- The label may be a fluorescent label or a 

nucleoside degrading activity may be used, e.g. enzymes ^ i * i i . i , . . , 

which cleave nucleotides at positions other than at the The preferred method of sequencmg-by-synthesis is how- 
phosphate group, for example at the base or sugar residues. '5 ever a pyrophosphate detection-based method. 

Thus, a nucleoside triphosphate degrading enzyme is essen- , Preferably, therefore, nucleotide mcorporation is detected 

tial for the invention. Nucleoside di- and/or mono-phosphate ^y detecting PPi release preferably by lummometnc detec- 

degrading enzymes are optional and may be used in com- ^'^'J '^^P^"^"^ biolummometnc detection, 

bination with a nucleoside tri-phosphate degrading enzyme. '^^^^^ determined by many diiferenl^ methods and a 

_ 1 1 . 1 1 1- . 40 number oi enzymatic methods have been described in the 

The preferred nudeotide degrading enzyme IS ^ ^^^^^^^ ^^9^9^^ Biochem., 28, 

which is both a nucleoside diphosphatase and triphos- 282-287; Guillory et al., (1971), Anal. Biochem., 39, 

^r^!f'\T.'.?L^''/'l ^^'l^':^'''''' NTP-NDP+Pi and ^jo_^^o. jotn^on et al., (1968), Anal. Biochem., 15, 273; 

NDP-NMP+Pi (where NTP is a nucleoside triphosphate, ^^^^ ^ ^^^^g^^ Biochem. 91, 557-565; and Drake 

NDP IS a nucleoside^diphosphate, NMP is a nucleotide ^ (1979), Anal. Biochem. 94, 117-120). 

monophosphate and Pi is inorganic phosphate). Apyrase ^^^^^^ luciferase and luciferin in combina- 

may be obtained from the Sigma Chemica Com^ ^.^^ .^^^^.^ ^^j^^^^ pyrophosphate since the 

possible nucleotide degrading enzymes include Pig Pancreas ^^^^^^ j. generated is substantially proportional to the 

^phosphate diphosphorydrolase (Le Bel et al., ^^^^^^ of pyrophosphate released which, in turn, is directly 

1980, J. Biol. Chem., 255, 1227-1233). Further enzymes are proportional to the amount of nucleotide incorporated. The 

descnbed in the literature. amount of light can readily be estimated by a suitable light 

The nucleotide-degrading enzyme may conveniently be sensitive device such as a luminometer. Thus, luminometric 

included during the polymerase (i.e. primer extension) reac- methods offer the advantage of being able to be quantitative, 

tion step. Thus, for example the polymerase reaction may Luciferin-luciferase reactions to detect the release of PPi 
conveniently be performed in the presence of a nucleotide- 55 are well known in the art. In particular, a method for 

degrading enzyme. Although less preferred, such an enzyme continuous monitoring of PPi release based on the enzymes 

may also be added after nucleotide incorporation (or non- aTP sulphurylase and luciferase has been developed (Nyren 

incorporation) has taken place, i.e. after the polymerase and Lundin, .\nal. Biochem., 151, 504-509, 1985; Nyren P, 

reaction step. Enzymatic method for continuous monitoring of DNApoly- 

Thus, the nucleotide-degrading enzyme (e.g. apyrase) 60 merase activity (1987) Anal. Biochem Vol 167 (235-238)) 

may be added to the polymerase reaction mixture (i.e. target and termed ELIDA (Enzymatic Luminometric Inorganic 

nucleic acid, primer and polymerase) in any convenient way. Pyrophosphate Detection Assay). The use of the ELIDA 

for example prior to or simultaneously with initiation of the method to detect PPi is preferred according to the present 

reaction, or after the polymerase reaction has taken place, invention. The method may however be modified, for 
e.g. prior to adding nucleotides to the sample/primer/poly- 65 example by the use of a more thermostable luciferase 

merase to initiate the reaction, or after the polymerase and (Kaliyama et al., 1994, Biosci. Biotech. Biochem., 58, 

nucleotide are added to the sample/primer mixture. 1170-1171) and/or ATP sulfurylase (Onda et al., 1996, 
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Bioscience, Bioteclinology and Biochemistry, 60:10, 
1740-42). This method is based on the following reactions: 



ATP sulphurylase ^ 

PPi + APS ^--Z—^ ATP + SC^- ^ 



ATP + luciferin + O2 



luciferase 



AMP + PPi + oxyluciferin + CO2 + hv 



10 



(APS=adenosine ^'-phospliosulphate) 

Reference may also be made to WO 98/13523 and WO 
98/28448, which are directed to pyrophosphate detection- 15 
based sequencing procedures, and disclose PPi detection 
methods which may be of use in the present invention. 

In a PPi detection reaction based on the enzymes ATP 
sulphurylase and luciferase, the signal (corresponding to PPi 
released) is seen as light. The generation of the light can be 20 
observed as a curve known as a Pyrogram™. Light is 
generated by luciferase action on the product, ATP (pro- 
duced by a reaction between PPi and APS (see below) 
mediated by ATP sulphurylase) and, where a nucleotide- 
degrading enzyme such as apyrase is used, this light gen- 25 
eration is then "turned off" by the action of the nucleotide- 
degrading enzyme, degrading the ATP which is the substrate 
for luciferase. The slope of the ascending curve may be seen 
as indicative of the activities of DNA polymerase (PPi 
release) and ATP sulphurylase (generating ATP from the PPi, 30 
thereby providing a substrate for luciferase). The height of 
the signal is dependent on the activity of luciferase, and the 
slope of the descending curve is, as explained above, indica- 
tive of the activity of the nucleotide-degrading enzyme. As 
explained below, in a Pyrogram'^^ in the context of a 35 
homopolymeric region, peak height is also indicative of the 
number of nucleotides incorporated for a given nucleotide 
addition step. Then, when a nucleotide is added, the amount 
of PPi released will depend upon how many nucleotides (i.e. 
the amount) are incorporated, and this will be reflected in the 40 
peak height. 

The use of pyrophosphate detection-based sequencing 
methods, and in particular those based on the ELIDA 
detection enzymes, is particularly advantageous in the 
present invention; the correlation between signals obtained 45 
in such methods (i.e. peak heights) and SNP allele frequen- 
cies has been shown to be excellent, and the accuracy of the 
results obtained surprisingly liigh. Frequencies as lows as 
5% for one allele have been determined with reasonable 
accuracy in pools of samples. 50 

Advantageously, by including the PPi detection 
enzyme(s) (i.e. the enzyme or enzymes necessary to achieve 
PPi detection according to the enzymatic detection system 
selected, which in the case of ELIDA, will be ATP sulphury- 
lase and luciferase) in the polymerase reaction step, the 55 
method of the invention may readily be adapted to permit 
extension reactions to be continuously monitored in real- 
time, with a signal being generated and detected, as each 
nucleotide is incorporated. 

Thus, the PPi detection enzymes (along with any enzyme 60 
substrates or other reagents necessary for the PPi detection 
reaction) may simply be included in the polymerase reaction 
mixture. 

A potential problem which has previously been observed 
with PPi -based sequencing methods is that dATP, used in the 65 
chain extension reaction, interferes in the subsequent 
luciferase-based detection reaction by acting as a substrate 



for the luciferase enzyme. This may be reduced or avoided 
by using, in place of deoxyadenosine triphosphate (ATP), a 
DATP analogue which is capable of acting as a substrate for 
a polymerase but incapable of acting as a substrate for a 
PPi-detection enzyme. Such a modification is described in 
detail in W098/13523. 

The term "incapable of acting" includes also analogues 
which are poor substrates for the detection enzymes, or 
which are substantially incapable of acting as substrates, 
such that there is substantially no, negligible, or no signifi- 
cant interference in the PPi detection reaction. 

Thus, a fiirther preferred feature of the invention is the use 
of a dATP analogue which does not interfere in the enzy- 
matic PPi detection reaction but which nonetheless may be 
normally incorporated into a growing DNA chain by a 
polymerase. By "normally incorporated" is meant that the 
nucleotide is incorporated with normal, proper base pairing. 
In the preferred embodiment of the invention where 
luciferase is a PPi detection enzyme, the preferred analogue 
for use according to the invention is the [1-thio] triphosphate 
(or a-thiotriphosphate) analogue of deoxy ATP, preferably 
deoxyadenosine [1-thio] triphosphate, or deoxyadenosine- 
thiotriphosphate (dATP aS) as it is also known. dATP aS, 
along with the a-thio analogues of dCTP, dGTP and dTTP, 
may be purchased from Amersham Pharmacia. Experiments 
have shown that substituting dATP with dATP aS allows 
efficient incorporation by the polymerase with a low back- 
ground signal due to the absence of an interaction between 
dATP aS and luciferase. False signals are decreased by 
using a nucleotide analogue in place of dATP, because the 
background caused by the ability of dATP to function as a 
substrate for luciferase is eliminated. In particular, an effi- 
cient incorporation with the polymerase may be achieved 
while the background signal due to the generation of light by 
the luciferin-luciferase system resulting from DATP inter- 
ference is substantially decreased. It has been noted by the 
inventors that the use of dATP aS can lead to higher peaks 
than the use of dATP. The peak height is consistently higlier, 
and thus if dATP aS is used, the actual 'peak height' can be 
calculated via a 'peak height reduction'. The dNTP aS 
analogues of the other nucleotides may also be used in place 
of the other dNTPs. 

The step of detecting nucleotide incorporation by detect- 
ing PPi release results in a signal indicative of the amount of 
pyrophosphate released, and hence the amount of nucleotide 
incorporated. 

In the method of the invention, the primer-extension 
reaction is performed simultaneously for each nucleic acid 
molecule in the reaction mixture. Thus, for every nucleotide 
addition to the reaction mixture, multiple nucleotides may 
be incorporated into the extended primers. The signal gen- 
erated in the pyrophosphate detection step will therefore be 
indicative of the number of nucleotides incorporated in the 
primer-extension step for the combination of all primers 
bound to the template nucleic acid. The size of the signal 
(i.e. the height of each peak) can therefore be correlated 
directly to the number of incorporated nucleotides. Typi- 
cally, the primer needs only to be subjected to 1 to 20, 
preferably 1 to 10, e.g. 1 to 5 and most preferably 2 to 4 
cycles of nucleotide addition. 

It will be understood that the order of nucleotide addition 
in the reaction mixture can be tailored to each SNP to ensure 
that the relevant allele frequency is obtained efficiently and 
accurately. For example, if the 2 possible allelic nucleotides 
are C or T (or vice versa), the order of nucleotide addition 
when extending the primer over the polymorphic site may be 
C followed by T, using the methods as described previously. 
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Therefore, the peaks showing nucleotide incorporation for 
the allehc variant bases should preferably be adjacent to 

each other, facilitating calculation of the allele frequencies. y = Peak Height (allele 1) 

. J • 1 +1. 11 1 • 4. Peak Height (allele 1) + Peak height (allele 2) 

As mentioned previously, the allele vanants are preier- ^ ^ 

ably not sequenced in a homopolymeric stretch of 3 or more 

identical bases. It will be clear that the peak height in such 

a situation will represent not only the nucleotide incorpo- where Y is the frequency of allele 1 . The concentration in the 
ration relating to the polymorphic position, but will also ^^^^^^Pl^ calculated by multiplying the concentration of the 
represent the incorporation of 2 or more nucleotides further relerence by a concentration factor (X). Therefore, X must 
downstream of the polymorphism. Thus, the number of be calculated. X is in relation to the reference sample used, 
nucleotides incorporated will also reflect the number of ?f the sample is heterozygous, X is calculated in the follow- 
nucleotides present in the homopolymeric region, which will • 
be the same for each allelic variant. Therefore, it is advisable 
to avoid performing allele frequency determinations on 

SNPs wherein one allelic variant lies within a homopoly- - 



1 — 2,Y 

meric stretch of three or more identical bases, unless a 
primer can be designed as described previously. 

It will be understood that in order to obtain accurate and However, if the sample is homozygous, the following cal- 
rehable data relatmg to the frequency of an allele m a culation is used: 
population, it will be preferable to use the same amount of 

nucleic acid for each individual in the population in the 

reaction mixture. Therefore, it may be necessary to calibrate y 
the samples prior to pooling. Thus, it forms a preferred ^ = j^y 

aspect of the invention to measure or determine the concen- 
tration of the nucleic acid in the sample prior to pooling. Any 

standard technique may be used to effect the measurement/ ^hus, once it has been decided what volume of one of the 
determination ot nucleic acid concentration, such as gel reference samples is to be used in the pool, the volume of 
electrophoresis and spectrophotometry. However, these samples to be added to the pool is calculated by dividing the 
methods are not without their drawbacks, as they rely upon 3^ ^^j^^ reference with the X value for each sample 

having a signmcant sample 01 nucleic acid to use lor • ^ 
concentration determination. A further aspect of this inven- 
tion is thus using a primer-extension reaction to calibrate the 

nucleic acid concentrations prior to pooling. volume (ref 1) 

In order to perform primer extension reactions to calculate 35 volume (sample «) = ^ ^^^^^i^ „) 

the concentration of nucleic acid in a sample, it will first be 
necessary to select a suitable SNP. A suitable SNP for such 

analysis will not be present in a homopolymeric sequence Alternatively or additionally, once it has been decided 

and will not be preferentially amplified in any PCR-type what volume of one of the reference samples is to be used 
reactions. Further, the SNP should be chosen such that it 40 in the pool, the volume of the second reference sample is set 
gives no background signals in a primer- extension reaction, by dividing the volume of reference 1 with the concentration 
and that the signals, e.g. peak height, (see before) are even. factor (X) of reference 2. 
Preferably, each of the individuals has a known sequence 
(genotype) at this SNP. If not, the sequence (genotype) can 

be determined using standard sequence-by-synthesis reac- 45 Volume (reference 2) = (reference 1) 

tion means. One reference sample (Ref 1) is selected as the ^ (reference 2) 

main reference from one of the homozygotes, another ref- 
erence sample (Ref 2) is selected from the other homozy- 

gote, and are pooled, and the method of the invention as From these 2 volumes (reference 1 and reference 2) the 
previously described may be carried out. The resuhs of the 50 volumes of samples to be added to the pool is calculated by 
primer extension reactions enable the relative concentrations dividing the volume for the reference with the X value for 
of each reference sample to be calculated, as the signals (e.g. each sample. It is important to use the correct reference for 
peak heights) (see before) are directly related to the amount each sample (i.e. the reference this sample has been com- 
of nucleotide incorporation. To measure the concentration of pared to), 
the rest of the samples in the population, these are pooled 55 
individually with one of the reference samples. Heterozy- 

gote samples should be paired with one of the homozygote Volume (sample n) = ^^^^ ^ 

references, and then analysed as mentioned previously. ^ (sample n) 

Thus, as the concentration of the reference sample is known, 
the concentration of the sample pooled with the reference 5Q 

sample can be easily calculated. Homozygote samples Thus, although different volumes are used for each sample, 
should be pooled with the other homozygote reference the amount of nucleic acid from each individual will be the 
sample (i.e. pair AA with CC, not AA with AA). same. Calculations have been performed in Example 1. 

The peak height for allele I (i.e. A) and the peak height The uniformity of nucleic acid amount of different indi- 

for allele 2 (i.e. C) are recorded, and the following calcu- 65 viduals in the population (i.e. in the individual nucleic acid 
lations are performed (for an allele not present in a samples which are pooled) may vary, depending on the 
homopolymer stretch): source and nature of the nucleic acid, and indeed the 
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importance of such umfomiity (and hence the need for However, this is not absohitely necessary and a double- 
calibration) may also vary, depending on the nucleic acid stranded nucleic acid molecule may be used as template, e.g. 
samples used. Thus, when using pooled genomic DNA with a suitable polymerase having strand displacement 
samples, unifomity of DNA concentration between indi- activity. 

vidual samples has been found to be of more importance and 5 Where a preliminary amplification step is used, regardless 

it is preferred first to calibrate the sample concentration for of how the nucleic acid has been amplified, all components 

optimum results. However, calibration is not absolutely of the amplification reaction need to be removed, to obtain 

necessary and the concentration of the nucleic acid in the pure nucleic acid, prior to carrying out the typing assay of 

sample may be estimated by standard methods. the invention. For example, unincorporated nucleotides, 

Tlie calibration procedure will be of particular interest, if 10 PGR primers, and salt from a PGR reaction need to be 

it is important to know the exact allele frequencies in a pool, removed. Methods for purifying nucleic aids are well known 

or ifthe pool consists ofa few samples and/or there are large in the art (Sambrook et al., supra), however a preferred 

differences in the individual DNA concentrations. method is to iimnobilize the nucleic acid molecule, remov- 

The amount of template nucleic acid frcm the pool of ^he impurities via washing and/or sedimentation tech- 
nucleic acid used for amplification has been found by the niques. 

inventors under certain circumstances to be important when Optionally, therefore, the target nucleic acid may be 

performing allele frequency studies, hi order to obtain provided with a means for immobilization, which may be 

reproducible results, at least 10 ng, preferably 10 to 100 ng, introduced dunng amplification, either througli the nucle- 

more preferably 10 to 50 ng and even more preferably 10 to ^tide bases or the pnmer/s used to produce the amplified 

20 ng of nucleic acid is generally preferred. Such amounts nucleic acid. 

are particularly recommended for genomic DNA but is To facilitate immobilization, the amplification primers 

equally applicable to cases wherein PGR products are ^^ed according to the invention may carry a means for 

pQQlg^ immobilization either directly or indirectly. Thus, for 

^ 11 1- .1 UW1 ^ A ^ ^ A example the primers may carry sequences which are 

Crenerally speaking the absolute level ol sienal detected f ^ ^ . i \ i i i. i 

(e.g. peak height in a Pyrogram™), does not significantly complementaiy to sequences which can be attached directly 

\cF + +1, 4^ 11 1 47 1 + • or mdirectly to an immobilizing support or may carry a 

anect the accuracy ol allele Irequency determinations as . • 1 1 p i- ? ..^ . 

1 .1 1 \j • 1 / 1 \ 11 u /• moiety suitable lor direct or indirect attachment to an 

long as the analysed signals (e.g. peaks) are well above (i.e. . , , . 

A' ^ • I. ui 1. \ • 1 1 11 1 • I. immobilizing support through a binding partner, 

distmguishably above) noise level. Generally speaking how- ... ^ y\ , r- • i i- . x-t^xt* 

^1 . I- Tu xM'-j ii Numerous suitable supports lor iromobilization ol DNA 

ever, the lowest peak in a Pyrogram^^ is ideally at least 2 , .i i ^ ^ i • i . j . .i n 
r»TTT / w V 1.4- \ 4. A-^' • 1. ^ • /u 1 30 and methods ol attaching nucleotides to them, are well 

RLU (relative light units) to distmguish fi-om noise/back- , . . ^ . 7 . -i j • .i i-. ^i 

ground. Single peak heights of at least 10 or 15 RLU have ^"^"^ m tiie art and widefy ^scribed m the literature Thus 

generally been found to be reliable, particularly if one of the fxample, supports in the form of microtitre plate (MTP) 

11 1 . + 1 . 1 r wells, tubes, dipsticks, particles, beads, fibres or capillaries 

alleles is represented at a low Irequency. ' ^ ^ V \ n i 

^ , , , 7 1 1 . .11 niay be used, made for example of agarose, sepharose, 
Prelerably, the concentration ol the nucleic acid m the 3^ cellulose, alginate, cellulose alginate, teflon, latex or poly- 

sample is determmed by a primer-extension reaction (as . aj + i+u ^ 

^ ^ • 1 \ styrene. Advantageously, the support may comprise beads, 

P e.g. sepharose beads produced by Amersham Biosciences 

Preferably, the genomic nucleic acid from all individuals (Uppsala, Sweden), or magnetic particles eg. the superpara- 

in the population are pooled, and amplified prior to analysis. magnetic beads produced by Dynal AS (Oslo, Norway) and 
Suitable amplification techniques have been discussed pre- 40 sold under the trademark DYNABEADS®. Ghips may be 

viously. As mentioned before, the nucleic acid may be of any used as solid supports to provide miniature experimental 

suitable nature. In order to increase the accuracy of allele systems as described for example in Nilsson et al. (Anal, 

frequency calculations, it is advisable to separate the nucleic Biochem. (1995), 224:400-408). 

acid pool prior to amplification into "sub-pools" (or several jhe solid support may carry fimctional groups such as 
PGR replicates) to enable multiple allele-firequency assays 45 hydroxyl, carboxyl, aldehyde or amino groups for the attach- 

of the invention to be performed for the same allele. Pref- ^ent of the primer or capture oligonucleotide. These may in 

erably, there are 1 or more sub-pools (i.e. 2, 3, 4, 5, 6, 7, 8, general be provided by treating the support to provide a 

9, 10 or more), and therefore the same study is replicated 1 surface coating of a polymer carrying one of such functional 

or more times. As mentioned previously, there is preferably groups, eg. polyurethane together with a polyglycol to 
at least 10 ng of nucleic acid present in the pool prior to 50 provide hydroxyl groups, or a cellulose derivative to provide 

amplification. Galculating an average allele frequency from hydroxyl groups, a polymer or copolymer of acrylic acid or 

the sub-pools improves the accuracy of allele frequency methacrylic acid to provide carboxyl groups or an amino 

determination when dealing with genomic or amplified alkylated polymer to provide amino groups. U.S. Pat. No. 

nucleic acid material. The use of amplified nucleic acid in 4,654,267 describes the introduction of many such surface 
the method of the invention is also envisaged. However, less 55 coatings. Alternatively, the support may carry other moieties 

replicate allele fi-equency experiments need to be performed for attachment, such as avidin or streptavidin (binding to 

than if genomic nucleic acid is pooled. biotin on the nucleotide sequence), DNA binding proteins 

In order for the primer-extension reaction (either for (eg. the lac 1 repressor protein binding to a lac operator 

calibration or allele frequency determination) to be per- sequence which may be present in the primer or oligonucle- 
formed, the nucleic acid molecule, regardless of whether or 60 otide), or antibodies or antibody fragments (binding to 

not it has been amplified, is conveniently provided in a haptens eg. digoxigenin on the nucleotide sequence). The 

single-stranded format. The nucleic acid may be subjected to streptavidin/biotin binding system is very commonly used in 

strand separation by any suitable technique known in the art molecular biology, due to the relative ease with which biotin 

(e.g. Sambrook et al., supra), for example by heating the can be incorporated within nucleotide sequences, and indeed 
nucleic acid, or by heating in the presence of a chemical 65 the commercial availability of biotin-labelled nucleotides, 

denaturant such as formamide, urea or formaldehyde, or by This represents one preferred method for iromobilisation of 

use of alkali. target nucleic acid molecules according to the present inven- 
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tion. Streptavidin-coated DYNABEADSCB) are coniiner- genomics. Repeated testing of a population of cells from an 

cially available from Dynal AS, and streptavidin-coated individual can give an estimation of the proportion of cells 

Sepharose beads are conmiercially available from Amer- that are carrying the disease-associated allele. By using the 

sham Biosciences. method of the invention, it is possible to separate the mixed 

As mentioned above, immobilization may conveniently 5 genotypes present in the mixed cell populations. This is a 

take place after amplification. To facilitate post amplification great advantage over prior methods where mixed genotypes 

immobilisation, one or both of the amplification primers are were indicated due to a mixture of cell types being present, 

provided with means for immobilization. Such means may It will be understood that this technology could also be used 

comprise as discussed above, one of a pair of binding to analyse multiploid genomes (e.g. plants). A further appli- 

partners, which binds to the corresponding binding partner 10 cation of determining allele frequency from a population of 

carried on the support. Suitable means for inmiobilization cells is that loss of heterozygosity can be examined. Tliis 

thus include biotin, haptens, or DNA sequences (such as the will detect whether a segment of chromosome has been lost 

lac operator) binding to DNA binding proteins. in tumour tissue. 

When immobilization of the amplification products is not A further application of the method of the invention is 

performed, the products of the amplification reaction may 15 testing for 'genetic drift'. Using the method of the invention, 

simply be separated by for example, taking them up in a it will be possible to obtain data on a particular allele 

formamide solution (denaturing solution) and separating the frequency within a given population at given time intervals, 

products, for example by electrophoresis or by analysis and determine whether over time, the frequency of an allele 

using chip teclmology. Immobilization provides a ready and changes. This type of analysis will therefore involve taking 

simple way to generate a single- stranded template for the 20 nucleic acid samples from multiple generations in a popu- 

extension reaction. As an alternative to immobilization, lation. It is thought that genetic drift is a useful indicator of 

other methods may be used, for example asymmetric PCR, evolutionary change, and the method of the invention will be 

exonuclease protocols or quick denaturation/annealing pro- able to measure such allele frequency change quickly and 

tocols on double stranded templates may be used to generate simply. 

single stranded DNA. Such techniques are well known in the 25 a further application of the method of the invention is for 

quantification of a gene/allele in human samples for trisomy 

The method of the invention allows the determination of tests (or other chromosome abnormalities or gene multipli- 

the frequency of an allele in a population (i.e. a group of cation etc). This is important in different syndromes where 

individuals exhibiting disease or trait, a familial group, an one chromosome occurs in three copies instead of two as 

ethnic group, a geographical group), wherein the allele 30 normal. A well-known syndrome is Downs Syndrome or 

assessed is a single nucleotide polymorphism (SNP) or any trisomy-21. Other trisomies are trisomy- 13, and 18. Other 

other allelic variant. syndromes related to duplications of sex chromosomes (or 

The method of the present invention is particularly advan- other chromosome number abnormality) can also be analy- 

tageous in determining whether a particular allehc variant is sed using the method of the invention. This can be per- 

linked to disease or trait. To enable such determination, 2 or 35 formed by quantilying the number of alleles of any gene (or 

more (i.e. 3 OR 4, 5, 6, 7, 8, 9 OR 10) pools of nucleic acid indeed any particular selected nucleotide sequence contain- 

molecules are analyzed. One pool comes from a population ing allelic variation or polymorphism) on the selected chro- 

exhibiting said disease or trait, whilst the second pool is mosome. 

selected from a population which do not exhibit said disease ^^^^^ invention is advantageous in that it 

or trait. If the frequency of one allelic variant is greater m the 40 determines the exact sequence of the SNP or allelic variant, 

'diseased', population, this points towards the allele being together with a direct measurement of tiie amount of nucle- 

associated with the disease or trait. However, it will be otide incorpomted. The primer extension reaction generates 

appreciated that the method of the invention can be per- ^ -pattern" indicative of nucleotide incorporation, correlated 

formed on 1 pool m isolation. nucleotide added to the reaction mixture. The pattern 

The method of the present invention may be used to 45 ^ cumulative picture of nucleotide incorporation for the 

confirm whether an allelic variation is present m a popula- ^^^^^^ ^ound to all of the nucleic acid molecules present in 

tion. For example, an SNP may be identified in silico (by p^^j ^^^^j^ ^jj^l^ frequency of an SNP or alldic 

searching databases and homologues) or identified m one ^^^^^ ^^^^ determined, several measurements 

population (i.e. an isolated geographical group or ethnic ^^^^ ^^^^^ ^1^1^ frequency to be 

group), and It may be desirable to ascertam the frequency of 50 calculated. The height of the peak (see before) for each 

an allele m another population (i.e. a difierent ethnic group ^u^ji^ ^^^^^^ residue needs to be measured, which should 

or different familial group). present adjacent to each other on the pattern of nucleotide 

The method of the present invention is particularly advan- incorporation obtained. The calculation of allele frequency 

tageous m studies of mutations associated with cancer. In ^hus be performed as follows: 
this case, the population is a sample of cells removed from 55 
a patient (i.e human, livestock animal, domestic animal or 

laboratory animal). In the population of cells, there will be Allele frequency (Allele 2) = 

a mixture of healthy and diseased cells, and the nucleic acid Peak Height (allele 2) 

from all cells in the population will be pooled. The pOpU- Peak Height (allele 2) + Peak Height (allele 1) ^ ^^^^'^ 

lation can then be scanned for SNPs which are associated 60 

with diseased state in the patient, giving patient-specific ^ , ^^tt^ . ^,rr. . , ^ . ^ 

information on the disease-associated allele and the fre- therefore if the SNP is C/T the calculation would be 

quency of that allele in a population of cells. This type of perlormed thus, 
information could be invaluable in the treatment of cancer, 

by aiding diagnosis and prognosis. Further, knowledge of 65 ^i^j^ frequency T = Peak height T ^ ^^^^^ 

the allele involved can allow the tailoring of treatment for height T + Peak height c 
the allele involved; this technology is known as pharmaco- 
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Thus, it is possible to obtain accurate, cost-efiective and 
rapid information on SNP allele frequencies in a population 
using nucleic acid pooling and primer-extension reactions, 
by monitoring nucleotide incorporation. 

The method of the invention relies upon the knowledge of 5 
the location and potential variants of the SNP or allelic 
variant, together with further known sequence information 
(e.g. with known sequences of conserved/semi-conserved 
regions) from which to determine an appropriate primer 
binding site and design a complementary extension primer, lo 
Using the method of the invention, the allele frequency of 
any SNP or allelic variant may be determined, whether 
present in coding or non-coding regions. 

The invention also comprises kits for carrying out the 
method of the invention. These will normally include one or 15 
more of the following components: 

optionally primer(s) for in vitro amplification; a primer 
for the primer extension reaction; nucleotides for amplifi- 
cation and/or for the primer extension reaction (as described 
above); a polymerase enzyme for the amplification and/or 20 
primer extension reaction; and means for detecting primer 
extension (e.g. means of detecting the release of pyrophos- 
phate as outlined and defined above). 
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The invention will now be described by way of nonlim- 
iting examples. 

EXAMPLE 1 

Templates and Primers 

These examples used DNAfrom 3 different sources which 
was either extracted from cell lines or from genomic 
sources. In total, DNA from 122 individual sources was 
used. The concentration of nucleic acid in some of the 
samples had been determined previously by measurement of 
absorbance at a wavelength of 260 nm. These samples were 
diluted to 2 ng/|Lil based on the absorbance measurements 
and the samples were either pooled directly, or after con- 
centration calibration. 

Some examples were performed on template oligonucle- 
otides instead of PCR products. These oligonucleotides were 
obtained from Interactiva Ulm, Germany. 

PCR amplification primers and sequencing primers were 
designed using Oligo 6.0 (Med Probe AS, Oslo, Norway). 
All primers were ordered from Literactiva (Supra). 



TABLE 1 

Primers and SNP definitions 





Upstream 


Downstream 


Sequencing 


Fragment 


Sequencing 


SNP_ID 


primer 


primer 


primer 


length [bp] output 


Eul 


Ela 


Elb 


Els 


158 


A/T 


(ACP-24 0) 


(SEQ ID NO: 3) 


(SEQ ID NO: 4) 


(SEQ ID NO: 5) 








5 ' -Biotin-ggt 


5 ' -get eee gea 


5 ' -aga aag gge 








egg get ggg 


gag gaa ge-3 ' 


etc etc tet tt-3 ' 








aag at- 3' 










Eu4 


E4a 


E4b 


E4s 


145 


A/G 


(ACEex 15) 


(SEQ ID NO: 6) 


(SEQ ID NO: 7) 


(SEQ ID NO: 8) 








5 ' -gcc agg aag 


5 ' -Biotin-gat 


5 ' -gae eta gaa 








ttt gat gtg aac- 


tec eet etc eet 


egg gea ge 3 ' 








3 ' 


gta eet-3 ' 








Eu7 


E7a 


E7b 


E7s 


142 


C/T 


(ANP1218) 


(SEQ ID NO: 9) 


(SEQ ID NO: 10 ) 


(SEQ ID NO: 11 ) 








5 ' -Biotin-tga 


5 ' -egg ett aee 


5 ' -aeg gea get 








tgt aac cct cct 


tte tge tgt agt- 


tet tee ee-3 ' 








etc ca3 ' 


3' 








460R 


PSO 145 


PSO 146 


PSO 147 


101 


CC/T 




(SEQ ID NO: 12) 


(SEQ ID NO: 13) 


(SEQ ID NO: 14) 








5 ' -B-gge tgc 


5 ' -tte agg aae 


5 ' -gag eag tee 








tgt tet gaa aee 


geg gge aag 


cca eee -3' 








ate tga -3 ' 


tc -3' 








461R 


Same as 460R 


Same as 4 6 OR 


PSO 148 


Same as 


C/TT 








(SEQ ID NO: 15) 


460R 










5 ' -geg gge aag 












tee aat -3' 






465R 


PSO 149 


PSO 150 


PSO 151 


85 


C/T 




(SEQ ID NO: 16) 


(SEQ ID NO: 17) 


(SEQ ID NO: 18) 








5 ' -B-gga aca 


5 ' -tec cca tge 


5 ' -gga gaa gte 








etg eet eee aet 


age eet aga 


eag tgt ge -3' 








tte tt-3' 


gae-3 ' 








466F 


PSO 182 


PSO 183 


PSO 184 


111 


C/T/G 




(SEQ ID NO: 19) 


(SEQ ID NO: 20) 


(SEQ ID NO: 21) 








5 ' -ttc caa agg 


5'-B-ect gea 


5 ' -tag etg ege 








acg cga cca 


eee eag aee 


ggg aa -3 ' 








taa-3 


aet ga-3' 








470R 


PSO 155 


PSO 156 


PSO 157 


102 


C/A 




(SEQ ID NO: 22) 


(SEQ ID NO: 23) 


(SEQ ID NO: 24) 








5'-B-cct acc 


5 '-gee tgg 


5 ' -gga gae aga 








cac agg cca 


gae etc act gtc 


atg etg at -3' 








gaa-3 ' 


-3' 








471F 


PSO 158 


PSO 159 


PSO 160 


126 


CCC/T 




(SEQ ID NO: 25) 


(SEQ ID NO: 26 ) 


(SEQ ID NO: 27 ) 








5 ' -gtt gcc etc 


5 '-B-tgt etc 


5 ' -gcc eag gaa 








tgg ttc cac ct 


eag eag etc ett 


gga ae -3 ' 








-3' 


cat c -3' 
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TABLE 1-continued 



Primers and SNP definitions 





Upstream 


Downstream 


Sequencing 


Fragment 


Sequencing 


SNP_ID 


primer 


primer 


primer 


length [bp] output 


481R 


PSO 167 


PSO 168 


PSO 169 


110 


T/G 




(SEQ ID NO: 28) 


(SEQ ID NO: 29) 


(SEQ ID NO: 30) 








5 ' -B-gat get 


5 ' -ctg gga tta 


5 ' -tag gag caa 








gta aca gag 


cag gtg tga 


gaa gta aac -3' 








acc cca ta -3 ' 


aca ct -3' 








486R 


PSO 173 


PSO 174 


PSO 175 


115 


TT/C 




(SEQ ID NO: 31) 


(SEQ ID NO: 32) 


(SEQ ID NO: 33) 








5 ' -B-caa ggt 


5'-ttg att etc 


5 ' -gcc tgg age 








aga gaa gtg 


ttt gag ccc 


tgt taa t -3' 








cag cat tea -3 ' 


aga tgt -3 ' 








lOOOF 


PSO 194 


PSO 195 


PSO 196 


159 


CC/T 


3345F 


PSO 199 


PSO 200 


PSO 201 


120 


A/GGGG 



TABLE 2 



Oligonucleotides used to create "artificial" SNPs. 



Se- 
quenc- 
ing 

output 



SNP name Oligoname Oligo Sequence 



Oligo 1 



Oligo 2 



Oligo 3 



Sequencing 
primer 



PS043SNP 



PS044SNP 



PS044SNP 



PS045SNP 



PS053SNP 



PS054SNP 



PS055NUSPT 



AGTCATGGTGCTGGGGCACTG CCCC/T 

GCCGTCGTTTTACAACG 

(SEQ ID NO: 34) 

AGTCATGGTGCTAGGGCAGTG 

GCCGTGGTTTTTACAAC6 

(SEQ ID NO: 35) 



25 



30 



35 



50 



PGR Amplification 

All fi'agments in the examples were amplified with the 
AmpliTaq Gold Kit (Applied Bio systems) and 2 mm MgCl2, 
according to the following protocol: 



PGR mix 


1 X mix 


GeneAmp lOxPCR butter n 


5 


MgCl2 (25 mM) 


4 


DNTP (2.5 mM) 


2.5 


DMSO 


0 


Primer a (10 ^iM) 


1 


Primer b (10 ^iM) 


1 


TaqGold (5 U/^il) 


0.3 


H2O 


31.2 


Siiiii 


4.^ 



AGTCATGGTGCTGGGGGCACT CCCCC/T 
GGCCGTCGTTTTACAACG 
(SEQ ID NO: 36) 
AGTCATGGTGCTAGGGGCACT 
GGCCGTCGTTTTACAACG 40 
(SEQ ID NO: 37) 



AGTCATGGTGCTAAGGGGGCA CCCCC/ 
CTGGCCGTCGTTTTACAACG TTT 
(SEQ ID NO: 38) 
AGTCATGGTGCTAAAGGGGCA 
CTGGCCGTCGTTTTACAACG 
(SEQ ID NO: 39) 
CGT TGT AAA ACG ACG 
GC 

(SEQ ID NO: 40) 



45 



Approximately 10 ng genomic DNA was added to 45 |xl 
of PGR mix to make a total PGR volume of 50 ^1. The PGR 
cycling conditions were as follows: 95 G for 5 minutes, 45 
cycles of (95 G for 15 seconds, Ta G for 30 seconds, 72 G 

for 15 seconds), 72 C for 5 minutes, 4 C. For SNPs Eul, Eu4 
and Eu7 Ta=57 C. Otherwise Ta=60 G. 

EXAMPLE 2 

DNA Galibration 

In order to calibrate the amount of DNA in each of the 
samples, an SNP was chosen for analysis. SNP 465R was 
chosen, it is a G/T SNP that generates good signals without 
preferential amplification, is not present in a homopolymeric 
stretch and gives no background signals or uneven peak 
heights. All samples were genotyped for the chosen SNP. 



TABLE 3 



Primers used to amplify and sequence SNP 4 65R. 



SNP 

ID Upstream primer 



Downstream 
primer 



Sequencing Fragment Sequencing 

primer length SNP output 



465R 5-B-gga aca ctg cct 5-tcc cca tgc age 5-gga gaa gtc 85 
ccc act ttc tt-3 ' cct aga gac-3 cag tgt gc-3 

(SEQ ID NO: 16) (SEQ ID NO: 17) (SEQ ID NO: 18) 



G/AC/T 
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Tlie genotyping was performed as follows. 5 f.il genomic 
DNA (at a concentration of approximately 2 ng/|Lil) was 
amplified as described previously in Example 1 . 25 |li1 of the 
PCR product was mixed with 8 |li1 magnetic beads Dyna- 
beads® (Dynal Biotech ASA, Oslo, Norway) (10 |xg/|xl) and 5 
17 ^il 2xBW buffer (10 mm Tris-HCl, 2M NaCl, 1 mM 
EDTA, 0.1% Tween 20). The strands were then separated 
using 50 |Lil 0.5M NaOH. The sample was then treated with 
Ix amiealing buffer (20 mM Tris-acetate, 5 mM Mg^\c), and 
washed. The beads were transferred to a PSQ 96™ plate 10 
(Pyrosequencing AB, Uppsala, Sweden) which contained 40 
jxl of Ix annealing buffer and 5 jxl sequencing primer. A 
sequencing reaction was then performed on a PSQ 96™ 
instrument (Pyrosequencing AB) using SNP reagent kit, 
product number 40-0001 (Pyrosequencing AB). Once the 15 
genotype of SNP 465R of each sample had been established, 
calibration was performed. 

2.5 |Lil of sample genomic DNA (at an approximate 
concentration of 2 ng/|xl) was added to 2.5 |il reference 
genomic DNA and 45 |j1 PCR mix added, and PCR per- 20 
formed (supra). 

The SNP was then analysed (as for genotyping assay) on 
a PSQ 96™ instrument (I^rosequencing AB) using Pjrose- 
quencing'T^ reagents (product no 40-0001). 
Calculations and data: 25 
Reference #1: T/T 
Reference #2: C/C 

Cone (Reference #2)=X^gy#2^ Cone (Reference #1) 
Cone (sample)=XxConc (Reference #1) 
Calculation of X^y#2 and YRg^#2- 
Reference #2+Reference #1 are pooled: 



yRef #2 



_ Peak height C 
^'^^ *^ " Peak height T 

Peak height C 
' (Peak height T + Peak height Q 



35 



Calculation of X and Y for all other samples: 
Homozygotes C/C sample+Reference #1 are pooled: 



28 



Homozygote T/T sample+Reference #2 are pooled: 



Peak height T 



Peak height T 



Peak height C (Peak height T + Peak height C) 



Heterozygote C/T+Reference #1: 



2 X Peak height C 



(Peak height T-Peak height Q 



Peak height C 



(Peak height T + Peak height C) 



T\BLE 4 



Results for some of the calibrated samples. 



Sample 



Sample 


Genotype 


Sample mix 


Allele 


height 


Y 


X 


Ref#2 


C/C 


ref#2 + ref#l 


C 


26.25 


0.51 


1.0 








T 


25.62 






#1 


C/C 


#1 +ref#l 


C 


19.68 


0.40 


0.7 








T 


30.07 






#2 


C/T 


#2 + ref#l 


C 


12.65 


0.24 


0.9 








T 


41.09 






#3 


C/T 


#3 + ref#l 


C 


12.64 


0.24 


1.0 








T 


39.09 






#18 


T/T 


#18 +ref#2 


C 


28.05 


0.45 


0.8 








T 


23.05 






#19 


T/T 


#19 +ref#2 


C 


33.78 


0.35 


0.5 








T 


18.13 







Thus, for further experiments, a given volume of refer- 
ence #1 is put into the pool, and the X and Y values obtained 
40 for the samples can be used to determine the volume of each 
sample to be added to the pool. 



X = 



Peak height C 

Peak height T 

Peak height C 



45 



" (Peak height T + Peak height Q 



Volume (Sample #1) : 



Volume (Sample #19) : 



Volume (Ref #1) 

■ X (Sample #1) 

Volume {Ref #1) 
^ X (Sample #19) 



TABLE 5 



Calculated X and Y values and thus volume of 
sample to use in pooling nucleic acid samples 

Sample Volume 



Sample 


Genotype 


Sample mix 


Allele 


Peak height 


Y 


X 


(^1) 


Ref#l 


C/C 




C 

T 






1.00 


50 


Ref #2 


C/C 


lef #2 +ref#l 


C 


26.25 


0.51 


1.02 


49 








T 


25.62 








#1 


C/C 


#1 +ref#l 


C 


19.68 


0.40 


0.65 


76 








T 


30.07 








#2 


C/T 


#2 +ref#l 


C 


12.65 


0.24 


0.90 


56 








T 


41.09 








#3 


C/T 


#3 +ref#l 


C 


12.64 


0.24 


0.96 


52 








T 


39.09 
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TABLE 5 -continued 



Calculated X and Y values and thus volume of 
sample to use in pooling nucleic acid samples 

Sample Volume 
Sample Genotype Sample mix Allele Peak height Y X (jil) 

#18 T/T #18+ref#2 C 28.05 0.45 0.84 59 

T 23.05 

#19 T/T #19+ref#2 C 33.78 0.35 0.55 91 

T 18.13 



Assessing DNA Calibration 465R, and sequencing was perfonned on PSQ™ 96 system. 

20 samples were chosen. The DNA concentrations had 15 -phe concentrations were compared with each other by 

been determined by using UV absorbance measurements ealculations on the peak heights, and are tabulated in Table 

and diluted to a concentration of 2 ne/Lil. The 20 samples had . ^ ^ . ^ , 

1 ' A- 'A 11 + A ^ TtT cTvm //i/:co\ • 6, below. Further, two test pools were made (one constructed 

been individually genotyped lor the SNP (46 5 R) using ' j r v 

PSQTM 96 system. The samples were pooled individually ^sing the calibrated concentrations (pool 1) and one using 
with a "reference DNA", also from the diversity panel. PGR 20 the original concentrations from UV absorbance measure- 
was performed to amplify the fragment containing SNP ments (pool 2). 

TABLE 6 



Calculations for DNA concentration adjustment 

Sample Peak Volume 



Sample 


Genotype 


Sample mix 


Allele 


height 


Y 


X 


Z 


(^1) 


Ref #2 


C/C 


ref #2 + ref #1 


C 


11,77 


0,60 


1,5 


1,0 


15 








T 


7,79 










#1 


C/T 


#1 + ref #1 


C 


7,17 


0,34 


2,2 


1,5 


10 








T 


13,63 










#2 


C/T 


#2 + ref #1 


C 


7,39 


0,35 


2,4 


,16 


9 








T 


13,44 










#3 


C/C 


#3 + ref #1 


C 


11,42 


0,60 


1,5 


1,0 


15 








T 


7,72 










#4 


C/T 


#4 + ref #1 


C 


6,77 


0,37 


2,9 


1,9 


8 








T 


11,5 










#5 


C/T 


#5 + ref #1 


C 


8,4 


0,41 


4,5 


3,0 


5 








T 


12,13 










#6 


C/C 


#6 + ref #1 


C 


9,02 


0,52 


1,1 


0,7 


21 








T 


8,39 










#7 


C/T 


#7 + ref #1 


C 


8,14 


0,38 


3,0 


2,0 


7 








T 


13,52 










#8 


C/T 


#8 + ref #1 


C 


8,47 


0,42 


5,2 


3,5 


4 








T 


11,71 










#9 


C/T 


#9 + ref #1 


C 


8,02 


0,39 


3,5 


2,3 


6 








T 


12,61 










#10 


C/T 


#10 + ref #1 


C 


6,71 


0,29 


1,4 


0,9 


16 








T 


16,17 










#11 


C/T 


#11 + ref #1 


C 


6,25 


0,30 


1,5 


1,0 


15 








T 


14,44 










#12 


C/C 


#12 + ref #1 


C 


14,2 


0,66 


1,9 


1,3 


12 








T 


7,39 










#13 


C/T 


#13 + ref #1 


C 


7,84 


0,37 


2,9 


1,9 


8 








T 


13,21 










#14 


C/T 


#14 + ref #1 


C 


6,67 


0,36 


2,7 


1,8 


8 








T 11,63 












#15 


C/T 


#15 + ref #1 


C 


3,08 


0,20 


0,7 


0,4 


34 








T 12,31 












#16 


C/C 


#16 + ref #1 


C 


11,82 


0,56 


1,3 


0,8 


18 








T 


9,29 










#17 


C/C 


#17 + ref #1 


C 


15,91 


0,73 


2,7 


1,8 


8 








T 


5,96 










#18 


T/T 


#18 + ref #2 


C 


12,91 


0,42 


0,7 


0,7 


21 








T 


9,41 










#19 


T/T 


#19 + ref #2 


C 


11,52 


0,44 


0,8 


0,8 


19 








T 


8,88 
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According to previous calculations for SNP465R EXAMPLE 4 

observed diflerences in DNA concentrations would not have 

had any detectable impact on the allele frequency measure- Pooling Strategies 

ment for 465R in these pools. Expected allele frequency for important to determine whether it is more preferable 

the T-allele was 40% in pool 1 and 41% in pool 2, which is 5 ^^^j genomic DNA or PGR product, as experimental 

an undetectable difference. Therefore, two fiirther SNPs variance can be expected once PGR amplification of the 

were selected to test the pools, SNP 461R and 470R. The genomic DNA has been performed. Thus, the SNP Eu7 

difference between the two pools was expected to be 3% for (a/G) was investigated, by sequencing the SNP in reverse 

both SNPs and that is a detectable difference. (T/G). 

For both pools, the estimated allele frequencies were in Ninety samples were individually genotyped for Eu7 and 

good accordance with what was expected, see FIG. 1 and thereafter pooled either before or after PGR amplification. 

Table 7. The experiment showed that it is possible to use with five repHcate reactions performed for each pool. The 

PyrosequencingTM as a method to calibrate DNA concentm- expected allele frequency is 27% G. The experiment was 

tions before pooling DNA. Further, the calibrated pool was 15 repeated in 3 subset populations (30-40 samples out of the 

more in accordance with the theoretical frequencies, as 90) with lower allele frequencies (15% G, 10% G and 5% G, 

determined from individual genotypes (10% for 461R and respectively). 

55% for 470R) Each replicate of a genomic DNA- or PGR-pool, 40 |li1 

PGR product was incubated with 15 |xl magnetic beads (10 

TABLE 7 20 ^^.^^ 25 |xl 2xBW buffer. The resulting single-peak 

height levels were about 40-60 RLU. The theoretical allele 





461R 


461R 


470R 


470R 




Pool 1 


Pool 2 


Pool 1 


Pool 2 


Replicate 1 


8.5 


5.9 


64.7 


56.9 


Replicate 2 


6.1 


7.2 


55.8 


54.1 


Replicate 3 


6.6 


8.1 


59.3 


58.1 


Replicate 4 


9.3 


4.8 


51.6 


59.8 


Replicate 5 


8.3 


3.5 


55.3 


56.5 


Replicate 6 


6.7 


5.6 


56.1 


59.2 


Replicate 7 


10.2 


4.7 


54.3 


62.8 


Replicate 8 


7.1 


6.6 


57.1 


58.5 


Replicate 9 


6.6 


6.3 


55.2 


54.7 


Replicate 10 


6.9 


3.8 


57.4 


55.5 


average 


7.6 


5.6 


56.8 


57.6 


calculated STD 


10.0 


7.0 


55.0 


58.0 




1.3 


1.3 


3.5 


2.5 



25 



30 



35 



Measured allele frequencies and STD for each pool compared to the frequency values (determined from the individual sample 

theoretically calculated frequencies of the DNA pools. genotypes) in the four tested sample sets were 27% G, 15% 

G, 10% G, and 5% G respectively. 

Pooling of PGR products resulted in good estimates of 
allele frequencies in all four pools (26%, 17%, 11%, and 7% 
respectively), and with low variance between replicate 
sequencing reactions. Pooling of genomic DNA resulted in 
accurate results (28%, 17%, 12%, and 6% respectively), but 
with slightly larger variation between replicate pools. 

The experiment indicated that pooling of genomic DNA is 
possible with the same accuracy as can be obtained with 
pooled PGR products. However, the replicate PGR amplifi- 
cations on the genomic DNA pool introduces additional 
experimental variance. Pooling of genomic DNA may there- 
fore require testing more replicate pools to obtain the same 
accuracy as when pooling PGR products. 

It can also be concluded that 5% of the G-allele could be 
Therefore, this method of sequencing can also be used reliably detected showing that even low allele frequencies 
reliably for the calibration of relative concentrations in a are capable of measurement using the method of the inven- 
pool of nucleic acid. This has appHcations for all sequenc- tion. 

ing-by-synthesis protocols. FIG. 2a represents graphically the allele frequency resuhs 

for 5 replicate PGR products on each of 4 pools. It can be 
EXAMPLE 3 seen that the estimated allele frequency (%) is in close 

correlation with the measured frequency. FIG. 2b shows 
graphically the allele frequency results for pooled genomic 
DNA, 5 replicate reaction per pool. Although the measured 
The pooled DNA (calibrated according to Example 2, or ^11^1^ frequency is slightly more variable for the genomic 
of known concentration) was added to 45 ^1 PGR mix ^^A when compared to the PGR products, the calculated 
(supra) and amplified as descnbed previously. 25 ^il of the j^ean values were still in close agreement with the estimated 
PGR product was mixed with 8 |j,l magnetic beads — Dyna- frequency, 
beads® (Dynal Biotech ASA, Oslo, Norway) (10 ^g/^1) as 

described in Example 2. Annealing of the primer to the Pooling of Genomic DNA 

template DNA was performed with 15 pmol sequencing Ninety samples were individually genotyped for five 
primer, for 2 minutes at 80 G. The samples were allowed to different SNPs. One A/G-SNP (Eu4), one tri-alleUc SNP 
cool to room temperature and the primer extension reaction 55 (466F), one simple G/T-SNP (465R), one C/T-SNP followed 
was performed on a PSQ™ 96 instrument (Pyro sequencing by a T (461 R), and one A/G-SNP (470R). A pool containing 
AB) using SNP reagent kit (Pyro sequencing AB). Once the ninety genomic DNA samples was created without calibra- 
peak height data was collected for the DNA pool, the allele of the DNA concentrations and therefore differed 

frequency can be calculated as follows if the SNP is not slightly in individual DNA concentrations. For Eu4, five 
present in a homopolymeric stretch: replicate PGR reactions were performed. For the other four 

SNPs, ten replicate PGR reactions were used. All PGR 
Allele frequency (Allele 2)= amplifications were performed with 10 ng genomic DNA as 

starting material in the PGR reaction. For Eu4, 40 \i\ PGR 

product was used for sequencing. For the other four SNP 

Peak Height (Allele 2) ^ ^^^^ 55 assays, 30 |Lil of each PGR product was used for Sequencing. 

Peak Height (Allele 2) + Peak Height (Allele 1) jj^g average allele frequencies and standard deviations were 

calculated. 



45 



SNP Analysis Protocol 
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Results on allele frequencies were calculated for five 
dijfferent SNPs, the results for which are tabulated below: 

TABLE 8 

Results from pooling experiments 

SNP Sequence 



Expected Measured 
Frequency Frequency 



466F 


[ C /T/G ] AAGGTTGTCCT C 


38.1% 


C 


40.8% 




(SEQ ID NO: 1) 


T 


37 .5% 


T 


32. 1% 






G 


24.4% 


G 


27.1% 


465R 


[C/T]GTTCCACCT 


C 


64.4% 


C 


65. 1% 




(SEQ ID NO: 2 ) 


T 


35 .6% 


T 


34.9% 


461R 


[C/T]TGCAGA 


C 


92 .2% 


C 


96.5% 






T 


7.8% 


T 


3.5% 


470R 


T[C/A]TCTGG 


C 


28.9% 


C 


28.2% 






A 


71.1% 


c 


71.8% 


Eu4 


[A/G]CTGCCT 


G 


56.7% 


G 


56.0% 






A 


43.3% 


A 


44.0% 



10 



15 



20 



30 



35 



The sequencing results are shown as "pyrogranis"™ 
(FIGS. 3a, 3b, 3c, 3d and 3e), wherein the peak height 
resulting from nucleotide addition is measured. No concen- 
tration calibration was performed for this experiment, and 
therefore different amounts of the individual nucleic acid 
samples were added to the pool. In view of this, the results 25 
are remarkably close to the estimated allele frequency for 
each pool. Tlie standard deviation values for the results were 
between 0.8 and 1.8, which was found to be comparable 
with previous allele frequency experiments. 

The result for the SNP 46 IR, which contains a T residue 
in a stretch of 2 T residues showed a lower value than 
expected. From further experimentation, this result turned 
out to be consistent for this allele, probably due to the fact 
that the SNP was present in a homopolymeric stretch. 

The pyrogram™ for SNP Eu4 (FIG. 3e) shows very high 
and wide peaks. This was due to the use of 40 |xl of PGR 
product. 

Detecting Allele Frequency Differences Between Pools 

Four sample pools, composed of 39-90 genomic DNA 40 
samples were constructed for both SNP 46 5 R and SNP 
46 IR. DNA concentration calibration was not performed 
before pooling. Allele frequencies were measured for 10 
repHcate reactions of each pool. 10 ng genomic DNA was 
used in a 50 |il PGR reaction and 30 jxl of the PGR product 45 
was used for the primer extension reactions. The average 
allele frequencies and standard deviations were calculated. 
95% and 99% confidence intervals were also estimated for 
the measured allele frequencies. 

As previously observed, the measured frequencies for the 50 
T-allele of SNP 46 IR are too low. However, the deviation 
proved to be consistent, enabling detection of even small 
differences in allele frequencies between pools. The smallest 
sample pool, SNP465R:4 with 39 samples, showed the 
largest deviation from the expected frequency, indicating the 55 
importance and difficulty of DNA pool construction. 

TABLE 9 



Pool ID 



Pool ID and % T calculated values 



Pool Size (N) 



%T 



SNP465R:1 
SNP465R:2 
SNP465R:3 
SNP465R:4 
SNP461R:1 



90 
71 
55 
39 
90 



35.6 
33.7 
30.6 
25.0 
7.8 
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34 

TABLE 9-continued 





Pool ID and % T calculated values 




Pool ID 


Pool Size (N) 


% T 


SNP461R:2 


80 


9.8 


SNP461R:3 


67 


12.8 


SNP461R:4 


58 


17.8 


TABLE 10 




Results for SNP456R and SNP461R 








% T [95% Conf. 


% T [99% Conf. 


Pool ID 


% T Std[%] Interval] 


Interval] 


SNP465R:1 


34.9 0.9 


34.3-35.5 


34.0-35.8 


SNP465R:2 


31.6 1.4 


30.6-32.6 


30.2-33.0 


SNP465R:3 


28.6 0.7 


28.1-29.1 


27.9-29.3 


SNP465R:4 


27.3 1.4 


26.3-28.3 


25.9-28.7 


SNP461R:1 


3.5 1.2 


2.6^.4 


2.3^.7 


SNP461R:2 


6.1 0.9 


5.5-6.7 


5.2-7.0 


SNP461R:3 


8.6 1.6 


7.5-9.7 


7.0-10.2 


SNP461R:4 


15.4 1.3 


14.5-16.3 


14.1-16.7 



EXAMPLE 5 

Peak Height Linearity 

To establish that a correlation exists between peak heights 
obtained in a primer-extension reaction, and the underlying 
allele frequency, 3 SNPs were investigated, Eul, Eu4 and 
Eu7. The DNA samples were amplified according to 
Example 1. Following PGR amplification, 2 homozygote 
samples were mixed in proportions in 5% increments from 
0% to 100% (i.e. 0:100, 5:95, . . . , 100:0). The primer- 
extension reaction was performed according to Example 3, 
and the allele frequencies calculated. 5 pmol PGR product 
was used for each primer-extension reaction, resulting in 
single peak height levels that were about 30-40 RLU 
(relative light units). The peak heights in RLU were plotted 
against the expected allele frequencies (FIGS. 4a, 4b and 
4c). A linear relationship over the complete range of tested 
allele frequencies was confirmed. Thus, the correlation 
between primer-extension peak heights and SNP allele fre- 
quencies is excellent. FIG. 5 depicts the linear relationship 
between allele frequency and peak height, and shows the 
peak height results for 3 primer extension reactions: 25% C, 
50% C and 75% C. 

SNPs Present in Homopolymeric Stretches 

To establish whether the presence of a homopolymeric 
stretch over an SNP alters the applicability of the method of 
the invention, primer-extension reactions were performed 
for 3 SNPs. Synthesized oligonucleotides (Interactiva, 
supra) were used in order to obtain an SNP where both 
alleles are located in a homopolymer, or where the SNP lies 
in a homopolymer of 3 or more identical residues. 

Prior to all experiments, the DNA pools were calibrated 
using the method described in Example 2. For each SNP, 10 
replicates of individual genotypes were analyzed in order to 
obtain reference data for comparison with the pools. The 
following SNPs were investigated: 

1 OOOF is a C/T-SNP which is preceded by a C. 24 samples 
were used to create five pools with different expected allele 
frequencies. (3,8% C, 7,1% C, 10% C, 31,2% C and 39,4% 
C). In the experiment, ten replicates were analyzed for each 
pool. 
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345F is an A/G-SNP followed by GGG. 24 samples were 
used to create two pools with an expected allele frequency 
of 26% A and 10% A respectively. Both pools were 
sequenced with two different dispensation orders to achieve 
either two or three peaks for the SNP. In the experiment, ten 5 
replicates were analyzed for each pool. 

SNP471F is a C/T SNP preceded by CC. Eight samples 
were used to create four different pools with an expected 
allele frequency of 4.5% T, 8% T, 21% T and 31% T 
respectively. In the experiment, ten replicates were analyzed 10 
for each pool. 

Oligo 1 , Oligo 2 and Oligo 3 are artificially created SNPs 
that were made by mixing two oligonucleotides that only 
differ in one base. (See table 2). The two differing oHgo- 
nucleotides were in each case mixed together with the 15 
following ratios: 5:95, 10:90, 20:80, 50:50, 80:20, 90: 10 and 
95:5. Oligo 1 is a C/T SNP preceded by CCC, Oligo 2 is a 
C/T SNP preceded by CCCC, and Oligo 3 is a C/T SNP 
preceded by CCCC and followed by TT. 

Results 

1. SNP lOOOF. (CC/T) 

Prior to the experiment this SNP was also used to calibrate 
the samples for the DNA pools. 30 |j1 of PCR product was 
incubated with 10 |xl magnetic beads and 20 |xl 2xBW-buffer. 25 
Pool 1 and Pool 2 show the difference in allele frequency 
between a calibrated pool (Pool 2) and a pool where the 
same volume of each sample has been used (Pool 1). Before 
the calibration, Pool 1 was expected to have an allele 
frequency of 3 1 .2. This was based on the assumption that all 30 
samples were of the same DNA concentration. The calibra- 
tion shows that this is not the case and based on the relative 
concentrations of the samples it is now possible to re- 
calculate the expected allele frequency of Pool 1 to be 39.4, 
which is much closer to the allele frequency that was 35 
obtained in the experiment. The results for these experi- 
ments are represented graphically as FIG. 6. 



TABLE 11 



The obtained allele frequencies for lOOOF 
compared ]o die expeeled freqiieiieies and the 
Sd'D for each pool. 



Replicate 


Pool 1 


Pool 2 


Pool 3 


Pool 4 


Pool 5 


1 


40.9 


31.5 


12.2 


11.3 


9.1 


2 


43.4 


35.2 


14.8 


12.3 


9.9 


3 


43.6 


34.1 


14.1 


13.0 


8.8 


4 


42.0 


35.9 


14.0 


11.9 


8.9 


5 


42.2 


37.4 


14.8 


11.9 


8.9 


6 


43.1 


34.3 


11.3 


12.8 


8.7 


7 


43.4 


36.1 


13.1 


11.7 


7.3 


8 


45.1 


32.7 


13.0 


12.5 


7.4 


9 


39.1 


34.0 


14.3 


12.5 


9.3 


10 


46.6 


33.4 


13.6 


9.3 


8.9 


average 


42.9 


34.4 


13.5 


11.9 


8.7 


expected 


39.4 


34.2 


10 


7.1 


3.8 


STD 


2 


1.66 


1.09 


1 


0.76 



45 



2. SNP 345F (A/GGGG). 

30 |xl of PCR product was incubated with 10 [i\ of 
magnetic beads and 20 jxl of 2xBW-buffer. Two pools were 
made with the expected allele frequencies of 10% A and 
26% A. 

A comparison was made between a dispensation order 
(i.e. order of addition of nucleotides in the primer extension 
reaction) that generates two peaks and one that generates 
three peaks if the sample is a heterozygote. The small 
differences in allele frequency between the two different 
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dispensation orders indicates that the result is not signifi- 
cantly influenced by whether the SNP has two or tliree 
informative peaks. The results are depicted graphically as 
FIGS. Sa and 8^. 

In this SNP the A-peak reduction factor was set to 80% 
due to the higher peak obtained when using modified dATP 
(dATP S). This was based on calculations of allele frequen- 
cies in a run with individual samples. (The individual 
samples were run with a dispensation order that generates 
three peaks.) Despite with adjustment the SNP does not 
show a completely linear relationship between peak heights 
and allele frequency for individual samples. The obtained 
pool results are higher than expected, with the largest 
aberration in the lower frequencies. If the pool results are 
compared with the frequencies for 345F in individual 
samples (FIG. 7) one can see that the pattern is similar. 
However, it is difficult to do any allele frequency studies on 
a SNP that is not linear. The results for this SNP are depicted 
graphically as FIG. 7. The standard line shows an imaginary 
pattern for an "ideal" SNP. 

TABLE 12 

The obtained allele frequencies for 345 F. 
compared to the expected frequencies and the 
STD for each pool. 





Pool 1 


Pool 1 


Pool 2 


Pool 2 


Replicate 


2 peaks 


3 peaks 


2 peaks 


3 peaks 


1 


36.0 


35.7 


14.5 


15.5 


2 


35.8 


33.7 


17.2 


17.2 


3 


34.5 


34.6 


13.6 


16.3 


4 


36.6 


35.2 


15.2 


15.8 


5 


33.2 


32.9 


11.4 


12.4 


6 


34.1 


35.1 


12.2 


13.9 


7 


33.7 


35.0 


12.7 


15.4 


8 


32.8 


35.5 


12.5 


16.1 


9 


35.7 


31.2 


14.4 


16.8 


10 


34.0 


33.7 


13.6 


15.6 


average 


34.6 


34.3 


13.7 


15.5 


expected 


26 


26 


10 


10 


STD 


1.23 


1.33 


1.6 


1.35 
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3. SNP471F (CCC/T). 

30 |xl of PCR product was incubated with 101 |il of 
magnetic beads and 20 ul 2xBW-buffer. Four pools were 
made with the expanded allele frequencies of 68.7% C, 
78.6% C, 91.7% C and 95.5% C. 

TABLE 13 



50 



55 



The obtained allele frequencies for SNP471F 
compared to the expected frequencies and the 
STD for each pool. The results ai'e depicted 
graphically as FIG. 9. The standard line 
shows an imaginary pattern for an "ideal" SNP. 



Replicate 


Pool 1 


Pool 2 


Pool 3 


Pool 4 


1 


64.0 


76.6 


87.6 


93.1 


2 


61.2 


73.3 


86.1 


91.7 


3 


62.3 


76.9 


86.0 


92.0 


4 


66.0 


76.7 


86.7 


91.0 


5 


65.3 


79.8 


85.5 


91.9 


6 


57.5 


77.3 


86.3 


90.0 


7 


68.6 


79.3 


85.6 


90.1 


8 


68.0 


78.2 


84.3 


92.0 


9 


70.5 


74.5 


88.2 


90.7 


10 








91.1 


average 


64.8 


77.0 


86.2 


91.5 


expected 


68.7 


78.6 


91.7 


95.5 


STD 


3.83 


1.96 


1.1 


0.81 
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4. Oligo 1 (CCCC/T), Oligo 2 (CCCCC/T) and Oligo 3 
(CCCCC/TTT). 

The two oligonucleotides used for each artificial SNP 
were mixed in different ratios to a final concentration of 1 
pmol/|j,l. 2 ul of each mix were annealed with 10 pmol of 5 
sequencing primer in a volume of 45 |xl. 

The obtained average allele frequencies for Oligo 1 and 2 
(FIG. 10Z>) are within 10% from the expected frequencies 
although the results do not seem to be completely linear. 
Oligo 3 (FIG. 10c) shows that a SNP with two homopoly- 10 
meric stretches can not be expected to give reliable allele 
frequencies; it is far from the expected frequencies. A 
cumulative representation of the results is shown as FIG. 
IM 

15 

EXAMPLE 6 



20 



Template Quantity 

It is important to use the correct amount of nucleic acid in 
order to reliably estimate allele frequency. To investigate the 
amount of genomic DNA required prior to amplification, the 
SNP465R was investigated. 10 ng, 1 ng, 0.1 ng and 0.05 ng 
DNA was added in 4 PGR amplification and subsequent 
primer-extension reactions. Four DNA pools were created 
from genomic DNA, with allele frequencies of 3 1 % C, 1 9% 
C, 12.5% C and 6% C. Standard calibration was performed 
20 |xl of PGR product was used in primer-extension. 

Results 

The experiment showed a significant correlation between 30 
the amount DNA used in the PGR reaction and the variation 
between replicates. In samples where 10 ng DNA were used 
in the PGR, the deviations between replicates were small but 
increased quickly when the template amount was lowered. 
But even for samples where only 0.05 ng DNA were used, 



the average allele frequencies of 10 replicates were in good 
accordance with the expected. A template amount of at least 
10 ng is required for a reliable allele frequency quantifica- 
tion if only one or few replicates are used. If many replicates 
are amplified, the average allele frequency will be correct 
even with lower DNA amount but the variation between 
replicates will be significant. The results are depicted graphi- 
cally on FIGS. 11a, b, c and d. 

Required Signal Level 

The height of the peak measured during primer-extension 
is correlated to many factors, including the amount of PGR 
product used. In order to determine the threshold signal level 
to calculate allele frequencies, several experiments were 
perfomied. Four different SNPs with different expected 
allele frequencies were used. One G/A-SNP (470R), one 
T/G-SNP (48 IR), one T/G-SNP with a T before the SNP 
(486R) and one G/T-SNP with a G before the SNP (460R). 
For SNP 470, a pool was created of several genomic 
samples. The expected allele frequency was 55% A in this 
pool. For the other SNPs a different pool of samples was 
used. The expected allele frequencies in that pool was 19.5% 
G for SNP481R, 12.5% G for SNP486R and 6% G for 
SNP460R. 

Results 

The peak heights do not seem to affect the allele fre- 
quency results in any dramatic way. If the single peak height 
is below 10 RLU, the signal-to-noise ratio might be too low 
for the SNP, if one of the alleles is represented at a low 
frequency. Although quite small, the variation between 
replicate reactions seems to increase slightly when the 
average single-peak height level gets below 15 RLU. The 
results are represented graphically as figure (12). 

All references cited herein are incorporated herein in their 
entirety. 



SEQUENCE LISTING 



<160> NUMBER OF SEQ ID NOS : 40 

<210> SEQ ID NO 1 

<211> LENGTH: 12 

<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 

<220> FEATURE: 

<221> NAME/KEY: misc_f eature 
<2 22 > LOCATION: ()..() 

<223> OTHER INFORMATION: n can be c, t or g 
<4 00> SEQUENCE: 1 

naaggttgtc ct 12 



<210> SEQ ID NO 2 

<211> LENGTH: 10 

<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 

<220> FEATURE: 

<221> NAME/KEY: misc_f eature 

<2 22 > LOCATION: ()..() 

<223> OTHER INFORMATION: n is c or t 

<4 00> SEQUENCE: 2 



ngttccacct 



10 
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-continued 



<210> SEQ ID NO 3 

<211> LENGTH: 17 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - Ela 

<400> SEQUENCE: 3 

ggtcgggctg ggaagat 17 

<210> SEQ ID NO 4 

<211> LENGTH: 17 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer -Elb 

<400> SEQUENCE: 4 

gctcccgcag aggaagc 17 



<210> SEQ ID NO 5 

<211> LENGTH: 20 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - Els 

<400> SEQUENCE: 5 

agaaagggcc tcctctcttt 20 



<210> SEQ ID NO 6 

<211> LENGTH: 21 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - E4a 

<400> SEQUENCE: 6 

gccaggaagt ttgatgtgaa c 21 



<210> SEQ ID NO 7 

<211> LENGTH: 21 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - E4b 

<4 00> SEQUENCE: 7 

gattcccctc tccctgtacc t 21 



<210> SEQ ID NO 8 
<211> LENGTH: 17 
<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 
<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 
<222> LOCATION: ()..() 
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-continued 



<223> OTHER INFORMATION: Primer - E4s 



<400> SEQUENCE: 8 



gacctagaac gggcagc 



17 



<210> SEQ ID NO 9 
<211> LENGTH: 20 
<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 
<220> FEATURE: 

<221> NAME /KEY: misc_f eature 
<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - E7a 
<400> SEQUENCE: 9 

tgatgtaacc ctcctctcca 20 



<210> SEQ ID NO 10 

<211> LENGTH: 21 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - E7b 

<400> SEQUENCE: 10 

cggcttacct tctgctgtag t 21 



<210> SEQ ID NO 11 

<211> LENGTH: 17 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - E7s 

<400> SEQUENCE: 11 

acggcagctt cttcccc 17 



<210> SEQ ID NO 12 

<211> LENGTH: 24 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 145 

<4 00> SEQUENCE: 12 

ggctgctgtt ctgaaaccat ctga 24 



<210> SEQ ID NO 13 
<211> LENGTH: 20 
<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 
<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 
<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 146 
<400> SEQUENCE: 13 

ttcaggaacg cgggcaagtc 20 



<210> SEQ ID NO 14 
<211> LENGTH: 15 
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<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME/KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 147 

<400> SEQUENCE: 14 

gagcagtccc caccc 15 



<210> SEQ ID NO 15 

<211> LENGTH: 15 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 148 

<400> SEQUENCE: 15 

gcgggcaagt ccaat 15 



<210> SEQ ID NO 16 

<211> LENGTH: 23 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 149 

<400> SEQUENCE: 16 

ggaacactgc ctcccacttt ctt 23 



<210> SEQ ID NO 17 
<211> LENGTH: 21 
<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 
<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 
<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 150 

<4 00> SEQUENCE: 17 

tccccatgca gccctagaga c 21 



<210> SEQ ID NO 18 

<211> LENGTH: 17 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 151 

<400> SEQUENCE: 18 

ggagaagtcc agtgtgc 17 



<210> SEQ ID NO 19 

<211> LENGTH: 21 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 182 



<400> SEQUENCE: 19 
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ttccaaagga cgcgaccata a 



21 



<210> SEQ ID NO 20 

<211> LENGTH: 20 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 183 

<400> SEQUENCE: 20 

cctgcacccc agaccactga 20 



<210> SEQ ID NO 21 
<211> LENGTH: 14 
<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 
<220> FEATURE: 

<221> NAME /KEY: misc_f eature 
<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 18 4 
<4 00> SEQUENCE: 21 

tagctgcgcg ggaa 14 



<210> SEQ ID NO 22 

<211> LENGTH: 18 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 155 

<400> SEQUENCE: 22 

cctacccaca ggccagaa 18 



<210> SEQ ID NO 23 

<211> LENGTH: 18 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 156 

<400> SEQUENCE: 23 

gcctgggacc tcactgtc 18 



<210> SEQ ID NO 24 

<211> LENGTH: 17 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 157 

<400> SEQUENCE: 24 

ggagacagaa tgctgat 17 



<210> SEQ ID NO 25 
<211> LENGTH: 20 
<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 
<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 



us 7,078,168 B2 



47 



48 



-continued 



<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 158 



<400> SEQUENCE: 25 



gttgccctct ggttccacct 



20 



<210> SEQ ID NO 26 
<211> LENGTH: 22 
<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 
<220> FEATURE: 

<221> NAME/KEY: misc_f eature 
<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 159 
<4 00> SEQUENCE: 26 

tgtctccagc agctccttca tc 22 



<210> SEQ ID NO 27 

<211> LENGTH: 14 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME/KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 160 

<4 00> SEQUENCE: 2 7 

gcccaggaag gaac 14 



<210> SEQ ID NO 28 

<211> LENGTH: 23 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 167 

<400> SEQUENCE: 28 

gatgctgtaa cagagacccc ata 23 



<210> SEQ ID NO 29 
<211> LENGTH: 23 
<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 
<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 
<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 168 
<400> SEQUENCE: 29 

ctgggattac aggtgtgaac act 23 



<210> SEQ ID NO 30 
<211> LENGTH: 18 
<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 
<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 
<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 169 
<400> SEQUENCE: 30 

taggagcaag aagtaaac 18 



<210> SEQ ID NO 31 
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<211> LENGTH: 24 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 17 3 

<4 00> SEQUENCE: 31 

caaggtagag aagtgcagca ttca 24 



<210> SEQ ID NO 32 
<211> LENGTH: 24 
<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 
<220> FEATURE: 

<221> NAME/KEY: misc_f eature 
<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 174 

<4 00> SEQUENCE: 32 

ttgattctct ttgagcccag atgt 24 



<210> SEQ ID NO 33 

<211> LENGTH: 16 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer - PSO 175 

<400> SEQUENCE: 33 

gcctggagct gttaat 16 



<210> SEQ ID NO 34 

<211> LENGTH: 38 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: oligonucleotide - PS043SNP 

<400> SEQUENCE: 34 

agtcatggtg ctggggcact ggccgtcgtt ttacaacg 38 



<210> SEQ ID NO 35 

<211> LENGTH: 38 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: oligonucleotide - PS044SNP 

<400> SEQUENCE: 35 

agtcatggtg ctagggcact ggccgtcgtt ttacaacg 38 



<210> SEQ ID NO 36 

<211> LENGTH: 39 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: oligonucleotide - PS044SNP 



<4 00> SEQUENCE: 3 6 
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agtcatggtg ctgggggcac tggccgtcgt tttacaacg 39 

<210> SEQ ID NO 37 

<211> LENGTH: 39 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME/KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: oligonucleotide - PS045SNP 

<4 00> SEQUENCE: 3 7 

agtcatggtg ctaggggcac tggccgtcgt tttacaacg 39 



<210> SEQ ID NO 38 

<211> LENGTH: 41 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: misc_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: oligonucleotide - PS053SNP 

<400> SEQUENCE: 38 

agtcatggtg ctaagggggc actggccgtc gttttacaac g 41 



<210> SEQ ID NO 39 

<211> LENGTH: 41 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: oligonucleotide - PS054SNP 

<400> SEQUENCE: 39 

agtcatggtg ctaaaggggc actggccgtc gttttacaac g 41 



<210> SEQ ID NO 40 

<211> LENGTH: 17 

<212> TYPE: DNA 

<213> ORGANISM: artificial sequence 

<220> FEATURE: 

<221> NAME /KEY: mis c_f eature 

<222> LOCATION: ()..() 

<223> OTHER INFORMATION: Primer- PS055NUSPT 

<400> SEQUENCE: 40 

cgttgtaaaa cgacggc 17 



The invention claimed is: 

1. A method of determining the frequency of an allele in 
a population of nucleic acid molecules, said method com- 55 

prising: 

pooling the nucleic acid molecules of said population to 
provide a pooled nucleic acid sample; performing 
primer extension reactions in a reaction mixture com- 
prising said pooled nucleic acid sample and a primer 60 
which binds at a predetermined site located in said 
nucleic acid molecules, wherein said site is substan- 
tially adjacent to a polymorphic position of interest in 
said allele, to provide primer extension products, and 
wherein the primer extension reaction is performed by 65 
sequentially adding non-chain terminating nucleotides 
to the reaction mixture and quantitatively determining 



the incorporation or non-incorporation of each nucle- 
otide as each nucleotide is added by bioluminometri- 
cally detecting the release of pyrophosphate; obtaining 
a pattern of nucleotide incorporation in said primer 
extension products at the positions that correspond to 
said polymorphic position of interest; and determining 
the frequency of said allele from said pattern of nucle- 
otide incorporation. 

2. The method according to claim 1 wherein ELIDA 
detection enzymes are used to detect the release of pyro- 
phosphate. 

3 . The method according to claim 2 wherein a nucleotide- 
degrading enzyme is included during the primer extension 
reaction. 
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4. The method according to clami 1 wherein the nucleic 
acid molecules are immobilized on a solid support. 

5. The method according to claim 1 wherein the amount 
or concentration of the nucleic acid in each sample of the 
population which is pooled, is determined prior to pooling. 5 

6. The method according to claim 5 wherein the concen- 
tration of the nucleic acid in each sample of the population 
is determined by a primer-extension reaction prior to pool- 
ing. 

7. The method according to claim 6 wherein the volume 10 
of each nucleic acid in each sample to be pooled is adjusted 

in view of the amount or concentration of nucleic acid 
present such that the pooled sample contains substantially 
the same amount or concentration of each nucleic acid 
molecule in the population. 15 

8. The method according to claim 7 wherein a particular 
polymorphism is selected as a reference polymorphism and 
said primer extension reaction used to determine the con- 
centration of nucleic acid in said sample is specific for said 
reference polymorphism. 20 

9. The method according to claim 8 wherein said poly- 
morphism is chosen such that it gives no background signals 
in a primer-extension reaction and that the signals are even. 

10. The method according to claim 8 wherein said poly- 
morphism is not present in a homopolymeric sequence and 25 
will not be preferentially amplified in any PCR-type reac- 
tions. 

11 . The method according to claim 8 wherein a reference 
sample containing said polymorphism is selected as the 
main reference from one of the homozygotes of one of the 30 
alleles of said polymorphism (Ref 1) and another reference 
(Ref 2) containing said polymorphism is selected from the 
other homozy got c, and the reference samples are pooled and 
primer extension reactions are performed, and the pattern of 
nucleotide incorporation determined to determine the rela- 35 
tive concentration of each reference sample. 
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12. The method according to claim 11 wherein the sample 
nucleic acid molecules to be tested are pooled individually 
with the reference samples. 

13. A method of determining the amount of an allele in a 
sample of nucleic acid molecules, said method comprising: 

performing primer extension reactions in a reaction mix- 
ture comprising said nucleic acid molecules, using a 
primer which binds at a predetermined site located in at 
least one said molecule wherein said site is substan- 
tially adjacent to a polymorpliic position of interest in 
said allele, to provide primer extension products, and 
wherein the primer extension reaction is performed by 
sequentially adding non-chain terminating nucleotides 
to the reaction mixture and quantitatively determining 
the incorporation or non-incorporation of each nucle- 
otide as each nucleotide is added by bioluminometri- 
cally detecting the release of pyrophosphate; determin- 
ing the type and number of nucleotides incorporated in 
said primer extension products at positions that corre- 
spond to the polymorpMc position of interest, and 
determining the amount of occurrence of said allele in 
said sample by analyzing the type and number of 
nucleotides incorporated. 

14. The method according to claim 13 wherein ELIDA 
detection enzymes are used to detect the release of pyro- 
phosphate. 

15. The method according to claim 14 wherein a nucle- 
otide-degrading enzyme is included during the primer exten- 
sion reaction. 

16. Tlie method according to claim 15 wherein the nucleic 
acid molecules are immobilized on a solid support. 
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(57) ABSTRACT 

There is disclosed an improved high-throughput and quan- 
titative process for determining methylation patterns in 
genomic DNA samples based on amplifying modified 
nucleic acid, and detecting methylated nucleic acid based on 
amplification-dependent displacement of specifically 
annealed hybridization probes. Specifically, the inventive 
process provides for treating genomic DNA samples with 
sodium bisulfite to create methylation-dependent sequence 
differences, followed by detection with fluorescence -based 
quantitative PCR techniques. The process is particularly 
well suited for the rapid analysis of a large number of nucleic 
acid samples, such as those from collections of tumor tissues 
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PROCESS FOR HIGH THROUGHPUT DNA 
METHYLATION ANALYSIS 

TECHNICAL FIELD OF THE INVENTION 

[0001] The present invention provides an improved high- 
throughput and quantitative process for determining methy- 
lation patterns in genomic DNA samples. Specifically, the 
inventive process provides for treating genomic DNA 
samples with sodium bisulfite to create methylation-depen- 
dent sequence dijfferences, followed by detection with fluo- 
rescence-based quantitative PCR techniques. 

BACKGROUND OF THE INVENTION 

[0002] In higher order eukaryotic organisms, DNA is 
methylated only at cytosines located 5' to guanosine in the 
CpG dinucleotide. This modification has important regula- 
tory effects on gene expression predominantly when it 
involves CpG rich areas (CpG islands) located in the pro- 
moter TN region of a gene sequence. Extensive methylation 
of CpG islands has been associated with transcriptional 
inactivation of selected imprinted genes and genes on the 
inactive X chromosome of females. Aberrant methylation of 
normally unmethylated CpG islands has been described as a 
frequent event in immortalized and transformed cells and 
has been frequently associated with transcriptional inacti- 
vation of tumor suppressor genes in human cancers. 

[0003] DNA methylases transfer methyl groups from a 
universal methyl donor, such as S-adenosyl methionine, to 
specific sites on the DNA. One biological function of DNA 
methylation in bacteria is protection of the DNA from 
digestion by cognate restriction enzymes. Mammalian cells- 
possess methylases that methylate cytosine residues on DNA 
that are 5' neighbors of guanine CpG). This methylation may 
play a role in gene inactivation, cell differentiation, tumori- 
genesis, X-chromosome inactivation, and genomic imprint- 
ing. CpG islands remain unmethylated in normal cells, 
except during X-chromosome inactivation and parental spe- 
cific imprinting where methylation of 5' regulatory regions 
can lead to transcriptional repression. DNA methylation is 
also a mechanism for changing the base sequence of DNA 
without altering its coding function. DNA methylation is a 
heritable, reversible and epigenetic change. Yet, DNA 
methylation has the potential to alter gene expression, which 
has profound developmental and genetic consequences. 

[0004] The methylation reaction involves flipping a target 
cytosine out of an intact double helix to allow the transfer of 
a methyl group from S-adenosylmethionine in a cleft of the 
enzyme DNA (cystosine-5)-methyltransferase (Klimasaus- 
kas et al.. Cell 16:351 -369, -199 A) to form 5-methylcytosine 
(5-mCy\). This enzymatic conversion is the only epigenetic 
modification of DNA known to exist in vertebrates and is 
essential for normal embryonic development (Bird, Cell 
10:5-8, 1992; Laird and Jaenisch, Human Mol. Genet. 
3:1487-1495, 1994; and Li et al.. Cell 69:915-926, 1992). 
The presence of 5-mCyt at CpG dinucleotides has resulted 
in a 5-fold depletion of this sequence in the genome during 
vertebrate evolution, presumably due to spontaneous deami- 
nation of 5-mCyt to T (Schoreret et al., Proc. Natl. Acad. ScL 
USA 89:957-961, 1992). Those areas of the genome that do 
not show such suppression are referred to as "CpG islands** 
(Bird, Nature 321:209-213, 1986; and Gardiner-Garden et 
al., J. Mol Biol 196:261-282, 1987). These CpG island 



regions comprise about 1% of vertebrate genomes and also 
account for about 15% of the total number of CpG dinucle- 
otides (Bird, Nature 321:209-213, 1986). CpG islands are 
typically between 0.2 to about 1 kb in length and are located 
upstream of many housekeeping and tissue-specific genes, 
but may also extend into gene coding regions. Therefore, it 
is the methylation of cytosine residues within CpG islands in 
somatic tissues, which is believed to affect gene function by 
altering transcription (Cedar, Cell 53:3-4, 1988). 

[0005] Methylation of cytosine residues contained within 
CpG islands of certain genes has been inversely correlated 
with gene activity. This could lead to decreased gene expres- 
sion by a variety of mechanisms including, for example, 
disruption of local chromatin structure, inhibition of tran- 
scription factor-DNA binding, or by recruitment of proteins 
which interact specifically with methylated sequences indi- 
rectly preventing transcription factor binding. In other 
words, there are several theories as to how methylation 
affects mRNA transcription and gene expression, but the 
exact mechanism of action is not well understood. Some 
studies have demonstrated an inverse correlation between 
methylation of CpG islands and gene expression, however, 
most CpG islands on autosomal genes remain unmethylated 
in the germline and methylation of these islands is usuaUy 
independent of gene expression. Tissue-specific genes are 
usually unmethylated in the receptive target organs but are 
methylated in the germline and in non-expressing adult 
tissues. CpG islands of constitutively -expressed housekeep- 
ing genes are normally unmethylated in the germhne and in 
somatic tissues. 

[0006] Abnormal methylation of CpG islands associated- 

with tumor suppressor genes may also cause dcreased gene 
expression. Increased methylation of such regions may lead 
to progressive reduction of normal gene expression resulting 
in the selection of a population of cells having a selective 
growth advantage (i.e., a malignancy). 

[0007] It is considered that an altered DNA methylation 
pattern, particularly methylation of cytosine residues, causes 
genome instability and is mutagenic. This, presumably, has 
led to an 80% suppression of a CpG methyl acceptor site in 
eukaryotic organisms, which methylate their genomes. 
Cytosine methylation further contributes to generation of 
polymorphism-and germ-line mutations and to transition 
mutations that inactivate tumor-suppressor genes (Jones, 
Cancer Res. 56:2463-2467, 1996). Methylation is also 
required for embryonic development of mammals (Li et al.. 
Cell 69:915-926, 1992). It appears that the methylation of 
CpG-rich promoter regions may be blocking transcriptional 
activity. Ushijima et al. {Proc. Natl. Acad. Sci. USA 94:2284- 
2289,1997) characterized and cloned DNA fragments that 
show methylation changes during murine hepatocarcinogen- 
esis. Data from a group of studies of altered methylation 
sites in cancer cells show that it is not simply the overall 
levels of DNA methylation that are altered in cancer, but 
changes in the distribution of methyl groups. 

[0008] These studies suggest that methylation at CpG-rich 
sequences, known as CpG islands, provide an alternative 
pathway for the inactivation of tumor suppressors. Methy- 
lation of CpG oligonucleotides in the promoters of tumor 
suppressor genes can lead to their inactivation. Other studies 
provide data that alterations in the normal methylation 
process are associated with genomic instability (Lengauer et 
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al. Proc. Natl Acad. Sci. USA 94:2545-2550, 1997). Such 
abnormal epigenetic changes may be found in many types of 
cancer and can serve as potential markers for oncogenic 
transformation, provided that there is a reliable means for 
rapidly determining such epigenetic changes. Therefore, 
there is a need in the art for a reliable and rapid (high- 
throughput) method for determining methylation as the 
preferred epigenetic alteration. 

[0009] Methods to Determine DNA Methylation 

[0010] There are a variety of genome scanning methods 
that have been used to identify altered methylation sites in 
cancer cells. For example, one method involves restriction 
landmark genomic scanning (Kawai et al., MoL Cell. Biol. 
14:7421-7427, 1994), and another example involves methy- 
lation-sensitive arbitrarily primed PCR (Gonzalgo et al.. 
Cancer Res. 57:594-599, 1997). Changes in methylation 
patterns at specific CpG sites have been monitored by 
digestion of genomic DNA with methylation-sensitive 
restriction enzymes followed by Southern analysis of the 
regions of interest (digestion-Southern method). The diges- 
tion-Southern method is a straightforward method but it has 
inherent disadvantages in that it requires a large amount of 
high molecular weight DNA (at least or greater than 5 //g) 
and has a limited scope for analysis of CpG sites(as deter- 
mined by the presence of recognition sites for methylation- 
sensitive restriction enzymes). Another method for analyz- 
ing changes in methylation patterns involves a PCR -based 
process that involves digestion of genomic DNA with 
methylation-sensitive restriction enzymes prior to PCR 
amplification (Singer-Sam et al., Nucl. Acids Res. 18:687, 
1990). However, this method has not been shown eftective 
because of a high degree of false positive signals (methy- 
lation present) due to inefficient enzyme digestion or over- 
amplification in a subsequent PCR reaction. 

[0011] Genomic sequencing has been simplified for analy- 
sis of DNA methylation patterns and 5methylcytosine dis- 
tribution by using bisulfite treatment (Frommer et al., Proc. 
Natl. Acad. Sci. USA 89:1827-1831, 1992). Bisulfite treat- 
ment of DNA distinguishes methylated from unmethylated 
cytosines; but original bisulfite genomic sequencing requires 
large-scale sequencing of multiple plasmid clones to deter- 
mine overall methylation patterns, which prevents this tech- 
nique from being commercially useful for determining 
methylation patterns in any type of a routine diagnostic 
assay. 

[0012] In addition, other techniques have been reported 
which utilize bisulfite treatment of DNA as a starting point 
for methylation analysis. These include methylation-specific 

PCR (MSP) (Herman et al. Proc. Natl. Acad. Sci. USA 
93:9821-9826, 1992); and restriction enzyme digestion of 
PCR products amplified from bisulfite-converted DNA 
(Sadri and Hornsby, Nucl Acids Res. 24:5058-5059, 1996; 
and Xiong and l.diixd.Nucl Acids Res. 25:2532-2534, 1997). 

[0013] PCR techniques have been developed for detection 
of gene mutations (Kuppuswamy et al., Proc. Natl. Acad. 
Sci. USA 88:1143-1147, 1991) and quantitation of allelic- 
specific expression (Szabo and Mann, Genes Dev. 9:3097- 
3108, 1995; and Singer-Sam et al., PCR Methods Appl 
1:160-163, 1992). Such techniques use internal primers, 
which anneal to a PCR-generated template and terminate 
immediately 5' of the single nucleotide to be assayed. 



However an allelic-specific expression technique has not 
been tried within the context of assaying for DNA methy- 
lation patterns. 

[0014] Most molecular biological techniques used to ana- 
lyze specific loci, such as CpG islands in complex genomic 
DNA, involve some form of sequence-specific amplifica- 
tion, whether it is biological amplification by cloning in E. 
coli, direct amplification by PCR or signal amplification by 
hybridization with a probe that can be visualized. Since 
DNA methylation is added post-repHcatively by a dedicated 
maintenance DNA methyltransferase that is not present in 
either E. coli or in the PCR reaction, such methylation 
information is lost during molecular cloning or PCR ampli- 
fication. Moreover molecular hybridization does not dis- 
criminate between methylated and unmethylated DNA, 
since the methyl group on the cytosine does not participate 
in base pairing. The lack of a facile way to amplify the 
methylation information in complex genomic DNA has 
probably been a most important impediment to DNA methy- 
lation research. Therefore, there is a need in the art to 
improve upon methylation detection techniques, especially 
in a quantitative manner. 

[0015] The indirect methods for DNA methylation pattern 
determinations at specific loci that have been developed rely 
on techniques that alter the genomic DNA in a methylation- 
dependent manner before the amplification event. There- are 
two primary methods that have been utilized to achieve this 
methylation-dependent DNA alteration. The first is digestion 
by a restriction enzyme that is affected in its activity by 
5-methylcytosine in a CpG sequence context. The cleavage, 
or lack of it, can subsequently be revealed by Southern 
blotting or by PCR. The other technique that has received 
recent widespread use -is the treatment of genomic DNA 
with sodium bisulfite. Sodium bisulfite treatment converts 
all unmethylated cytosines in the DNA to uracil by de ami- 
nation, but leaves the methylated cytosine residues intact. 
Subsequent PCR amplification replaces the uracil residues 
with thymines and the 5-methylcytosine residues with 
cytosines. The resulting sequence difference has been 
detected using standard DNA sequence detection tech- 
niques, primarily PCR. 

[0016] Many DNA methylation detection techniques uti- 
lize bisulfite treatment. Currently, all bisulfite treatment- 
based methods are followed by a PCR reaction to analyze 
specific loci within the genome. There are two principally 
different ways in which the sequence difference generated 
by the sodium bisulfite treatment can be revealed. The first 
is to design PCR primers that uniquely anneal with either 
methylated or unmethylated converted DNA. This technique 
is referred to as "methylation specific PCR" or "MSP". The 
method used by all other bisulfite -based techniques (such as 
bisulfite genomic sequencing, COBRA and Ms-SNuPE) is to 
amplify the bisulfite-converted DNA using primers that 
anneal at locations that lack CpG dinucleotides in the 
original genomic sequence. In this way, the PCR primers can 
amplify the sequence in between the two primers, regardless 
of the DNA methylation status of that sequence in the 
original genomic DNA. This results in a pool of different 
PCR products, all with the same length and diftering in their 
sequence only at the sites of potential DNA methylation at 
CpGs located in between the two primers. The difference 
between these methods of processing the bisulfite-converted 
sequence is that in MSP, the methylation information is 
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derived from the occurrence or lack of occurrence of a PGR 
product, whereas in the other techniques a mix of products 
is always generated and the mixture is subsequently ana- 
lyzed to yield quantitative information on the relative occur- 
rence of the different methylation states. 

[0017] MSP is a qualitative technique. There are two 
reasons that it is not quantitative. The first is that methylation 
information is derived from the comparison of two separate 
PGR reactions (the methylated and the unmethylated ver- 
sion). There are inherent difficulties in making kinetic com- 
parisons of two different PGR reactions. The other problem 
with MSP is that often the primers cover more than one GpG 
dinucleotide. The consequence is that multiple sequence 
variants can be generated, depending on the DNA methyla- 
tion pattern in the original genomic DNA. For instance, if 
the forward primer is a 24-mer oligonucleotide that covers 
3 CpGs, then 2^=8 different theoretical sequence permuta- 
tions could arise in the genomic DNA following bisulfite 
conversion within this 24-nucleotide sequence. If only a 
fully methylated and a fully unmethylated reaction is run, 
then you are really only investigating 2 out of the 8 possible 
methylation states. The situation is fisher complicated if the 
intermediate methylation states lead to amplification, but 
with reduced efficiency. Therefore, the MSP technique is 
non-quantitative. Therefore, there is a need in the art to 
improve the MSP technique and change it to be more 
quantitaive and facilitate its process to greater throughput. 
The present invention addresses this need or a more rapid 
and quantitative methylation assay. 

SUMMARY OF THE INVENTION 

[0018] The present invention provides a method for 
detecting a methylated GpG island within a genomic sample 
of DNA comprising: 

[0019] (a) contacting a genomic sample of DNA from 
a patient with a modifying agent that modifies unm- 
ethylated cytosine to produce a converted nucleic 
acid; 

[0020] (b) amplifying the converted nucleic acid by 
means of two oligonucleotide primers in the pres- 
ence or absence of one or a plurality of specific 
oligonucleotide probes, wherein one or more of the 
oligonucleotide primers and/or probes are capable of 
distinguishing between unmethylated and methy- 
lated nucleic acid; and 

[0021] (c) detecting the methylated nucleic acid 
based on amplification-mediated displacement of the 
probe. Preferably, the amplifying step is a poly- 
merase chain reaction (PGR) and the modifying 
agent is bisulfite. Preferably, the converted nucleic 
acid contains uracil in place of unmethylated 
cytosine residues present in the unmodified genomic 
sample of DNA. Preferably, the probe further com- 
prises a fluorescence label moiety and the amplifi- 
cation and detection step comprises fluorescence- 
based quantitative PGR. 

[0022] The invention provides a method for detecting a 
methylated GpG -containing nucleic acid comprising: 

[0023] (a) contacting a nucleic acid-containing 
sample with a modifying agent that modifies unm- 
ethylated cytosine to produce a converted nucleic 
acid; 



[0024] (b) amplifying the converted nucleic acid in 
the sample by means of oligonucleotide primers in 
the presence of a GpG-specific oligonucleotide 
probe, wherein the GpG-specific probe, but not the 
primers, distinguish between modified unmethylated 
and methylated nucleic acid; and 

[0025] (c) detecting the methylated nucleic acid 
based upon an amplification-mediated displacement 
of the CpG-specific probe. Preferably, the amplify- 
ing step comprises a polymerase chain reaction 
(PGR) and the modifying agent comprises bisulfite. 
Preferably, the converted nucleic acid contains uracil 
in place of unmethylated cytosine residues present in 
the unmodified nucleic acid-containing sample. Pref- 
erably, the detection method is by means of a mea- 
surement of a fluorescence signal based on amplifi- 
cation-mediated displacement of the GpG-specific 
probe and the amplification and detection method 
comprises fluorescence -based quantitative PGR. The 
methylation amounts in the nucleic acid sample are 
quantitatively determined based on reference to a 
control reaction for amount of input nucleic acid. 

[0026] The present invention further provides a method 
for detecting a methylated GpG-containing nucleic acid 

comprising: 

[0027] (a) contacting a nucleic acid-containing 
sample with a modifying agent that modifies unm- 
ethylated cytosine to produce a converted nucleic 
acid; 

[0028] (b) amplifying the converted nucleic acid in 
the sample by means of oligonucleotide primers and 
in the presence of a GpG-specific oligonucleotide 
probe, wherein both the primers and the GpG-spe- 
cific probe distinguish between modified unmethy- 
lated and methylated nucleic acid; and 

[0029] (c) detecting the methylated nucleic acid 
based on amplification-mediated displacement of the 
GpG-specific probe. Preferably, the amplifying step 
is a polymerase chain reaction (PGR) and the modi- 
fying agent is bisulfite. Preferably, the converted 
nucleic acid contains uracil in place of unmethylated 
cytosine residues present in the unmodified nucleic 
acid-containing sample. Preferably, the detection 
method comprises measurement of a fluorescence 
signal based on amplification-mediated displace- 
ment of the GpG-specific probe and the amplification 
and detection method comprises fluorescence -based 
quantitative PGR. 

[0030] The present invention further provides a methyla- 
tion detection kit useful for the detection of a methylated 
GpG-containing nucleic acid comprising a carrier means 
being compartmentalized to receive in close confinement 
therein one or more containers comprising: 

[0031] (i) a first container containing a modifying 
agent that modifies unmethylated cytosine to pro- 
duce a converted nucleic acid; 

[0032] (ii) a second container containing primers for 
amplification of the converted nucleic acid; 

[0033] (iii) a third container containing primers for 
the amplification of control unmodified nucleic acid; 
and 
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[0034] (iv) a fourth container containing a specific 
oligonucleotide probe the detection of which is based 
on amplification-mediated displacement, 

[0035] wherein the primers and probe each may or may 
not distinguish between unmethylated and methylated 
nucleic acid. Preferably, the modifying agent comprises 
bisulfite. Preferably, the modifying agent converts cytosine 
residues to uracil residues. Preferably, the specific oligo- 
nucleotide probe is a CpG-specific oligonucleotide probe, 
wherein the probe, but not the primers for amplification of 
the converted nucleic acid, distinguishes between modified 
unmethylated and methylated nucleic acid. Alternatively, the 
specific oligonucleotide probe is a CpG-specific oligonucle- 
otide probe, wherein both the probe and the primers for 
amplification of the converted nucleic acid, distinguish 
between modified unmethylated and methylated nucleic 
acid. Preferably, the probe further comprises a fluorescent 
moiety linked to an oligonucleotide base directly or through 
a linker moiety and the probe is a specific, dual-labeled 
TaqMan probe. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0036] FIG. 1 shows an outline of the MSP technology 
(prior art) using PCR primers that initially discriminate 
between methylated and unmethylated (bisulfite -converted) 
DNA. The top part shows the result of the MSP process 
when unmethylated single -stranded genomic DNA is ini- 
tially subjected to sodium bisulfite conversion (deamination 
of unmethylated cytosine residues to uracil) followed by 
PCR reactions with the converted template, such that a PCR 
product appears only with primers specifically annealing to 
converted (and hence unmethylated) DNA. The bottom 
portion shows the contrasting result when a methylated 
single-stranded genomic DNA sample is used. Again, the 
process first provides for bisulfite treatment followed by 
PCR reactions such that a PCR product appears only with 
primers specifically annealing to unconverted (and hence 
initially methylated) DNA. 

[0037] FIG. 2 shows an alternate process for evaluating 
DNA methylation with sodium bisulfite- treated genomic 
DNA using nondiscriminating (with respect to methylation 
status) forward and reverse PCR primers to amplify a 
specific locus. In this illustration, denatured (ie., single- 
stranded) genomic DNA is provided that has mixed methy- 
lation status, as would typically be found in a sample for 
analysis. The sample is converted in a standard sodium 
bisulfite reaction and the mixed products are amplified by a 
PCR reaction using primers that do not overlap any CpG 
dinucleo tides. This produces an unbiased (with respect to 
methylation status) heterogeneous pool of PCR products. 
The mixed or heterogeneous pool can then be analyzed by 
a technique capable of detecting sequence differences, 
including direct DNA sequencing, subcloning of PCR frag- 
ments followed by sequencing of representative clones, 
single-nucleotide primer extension reaction (MS-SNuPE), 
or restriction enzyme digestion (COBRA). 

[0038] FIG. 3 shows a flow diagram of the inventive 
process in several, but not all, alternative embodiments for 
PCR product analysis. Variations in detection methodology, 
such as the use of dual probe technology (Light cycler®) or 
flourescent primers (Sunrise® technology) are not shown in 
this Figure. Specifically, the inventive process begins with a 



mixed sample of genomic DNA that is converted in a sodium 
bisulfite reaction to a mixed pool of methylation-dependent 
sequence differences according to standard procedures (the 
bisulfite process converts unmethylated cytosine residues to 
uracil). Fluorescence -based PCR is then performed either in 
an "unbiased" PCR reaction with primers that do not overlap 
known CpG methylation sites (left arm of FIG. 3), or in a 
"biased" reaction with PCR primers that overlap known 
CpG dinucleo tides (right arm of FIG. 3). Sequence discrimi- 
nation can occur either at the level of the amplification 
process (C and D) or at the level of the fluorescence 
detection process (B), or both (D). A quantitative test for 
methylation patterns in the genomic DNA sample is shown 
on the left arm (B), wherein sequence discrimination occurs 
at the level of probe hybridization. In this version, the PCR 
reaction provides for unbiased amplification in the presence 
of a fluorescent probe that overlaps a particular putative 
methylation site. An unbiased control for the amount of 
input DNA is provided by a reaction in which neither the 
primers, nor the probe overlie, any CpG dinucleotides (A). 
Alternatively, as shown in the right arm of FIG. 3, a 
quaUtative test for genomic methylation is achieved by 
probing of the biased PCR pool with either control, oligo- 
nucleotides that do not "cover** known methylation sites (C; 
a fluorescence-based version of the MSP technique), or with 
oligonucleotides covering potential methylation sites (D). 

[0039] FIG. 4 shows a flow chart overview of the inven- 
tive process employing a "TaqMan©" probe in the ampli- 
fication process. Briefly, double-stranded genomic DNA is 
treated with sodium bisulfite and subjected to one of two sets 
of PCR reactions using TaqMan® probes; namely with 
either biased primers and TaqMan® probe (left column), or 
unbiased primers and TaqMan® probe (right column). The 
TaqMan® probe is dual-labeled with a fluorescent 
"reporter** (labeled "R'* in FIG. 4) and "qencher** (labeled 
"O**) molecules, and is designed to be specific for a rela- 
tively high GC content region so that it melts out at about 
10° C. higher temperature in the PCR cycle than the forward 
or reverse primers. This allows it to remain fully hybridized 
during the PCR annealing/extension step. As the Taq poly- 
merase enzymatically synthesizes a new strand during PCR, 
it will eventually reach the annealed TaqMan® probe. The 
Taq poljmierase 5' to 3' endonuclease activity wiU then 
displace the TaqMan® probe by digesting it to release the 
fluorescent reporter molecule for quantitative detection of its 
now unquenched signal using a real-time fluorescent system 
as described herein. 

[0040] FIG. 5 shows a comparison of the inventive assay 
to a conventional COBRA assay. Panel A shows a COBRA 
gel used to determine the level of DNA methylation at the 
ESRl locus in DNAs of known methylation status (sperm, 
unmethylated) and HCT116 (methylated). The relative 
amounts of the cleaved products are indicated below the gel. 
A 56-bp fragment represents DNA molecules in which the 
TaqI site proximal to the hybridization probe is methylated 
in the original genomic DNA. The 86-bp fragment repre- 
sents DNA molecules in which the proximal TaqI site is 
unmethylated and the distal site is methylated. Panel B 
summarizes the COBRA results and compares them to 
results obtained with the methylated and unmethylated ver- 
sion of the inventive assay process. The results are expressed 
as ratios between the methylation-specific reactions and a 
control reaction. For the bisulfite- treated samples, the con- 
trol reaction was a MYODl assay as described in Example 
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1. For the untreated samples, the ACTB primers described 
for the RT-PCR reactions were used as a control to verify the 
input of unconverted DNA samples. (The ACTB primers do 
not span an intron). "No PCR" indicates that no PCR- 
product was obtained on unconverted genomic DNA with 
COBRA primers designed amplify bisulfite -converted DNA 
sequences. 

[0041] FIG. 6 illustrates a determination of the specificity 
of the oligonucleotides. Eight different combinations of 
forward primer, probe and reverse primer were -tested on 
DNA samples with known methylation or lack of methyla- 
tion at the ESRl locus. Panel A shows the nomenclature 
used for the combinations of the ESRl oligos. "U" refers to 
the oligo sequence that anneals with bisulfite-converted 
unmethylated DNA, while "M" refers to the methylated 
version. Position 1 indicates the forward PCR primer, posi- 
tion 2 the probe, and position 3 the reverse primer. The 
combinations used for the eight reactions are shown below 
each pair of bars, representing duplicate experiments. The 
results are expressed as ratios between the and the MYODl 
control values. Panel B represents an analysis of human 
sperm DNA. Panel C represents an analysis of DNA 
obtained from the human colorectal cancer cell line 
HCT116. 

[0042] FIG. 7 shows a test of the reproducibility of the 
reactions. Assays were performed in eight independent reac- 
tions to determine the reproducibility on samples of complex 
origin. A primary human colorectal adenocarcinoma and 
matched normal mucosa was used for this purpose (samples 
ION and lOT shovm in FIG. 8). The results shown in this 
figure represent the raw values obtained in the assay. The 
values have been plate-normalized, but not corrected for 
input DNA. The bars indicate the mean values obtained for 
the eight separate reactions. The error bars represent the 
standard error of the mean. 

[0043] FIG. 8 illustrates a comparison of MLHl expres- 
sion, microsatellite instability and MLHl promoter methy- 
lation of 25 matched-paired human colorectal samples. The 
upper chart shows the MLHl expression levels measured by 
quantitative, real time RT-PCR (TaqMan®) in matched 
normal (hatched bars) and tumor (solid black bars) colorec- 
tal samples. The expression levels are displayed as a ratio 
between MLHl and ACTB measurements. Microsatellite 
instability status (MSI) is indicated by the circles located 
between the two charts. A black circle denotes MSI posi- 
tivity, while an open circle indicates that the sample is MSI 
negative, as determined by analysis of the BAT25 and 
BAT26 loci. The lower chart shows the methylation status of 
the MLHl locus as determined by an inventive process. The 
methylation levels are represented as the ratio between the 
MLHl methylated reaction and the MYODl reaction. 

DETAILED DESCRIPTION OF THE 
INVENTION 

[0044] The present invention provides a rapid, sensitive, 
reproducible high -throughput method for detecting methy- 
lation patterns in samples of nucleic acid. The invention 
provides for methylation-dependent modification of the 
nucleic acid, and then uses processes-of nucleic acid ampli- 
fication, detection, or both to distinguish between methy- 
lated and unmethylated residues present in the original 
sample of nucleic acid. In a preferred embodiment, the 



invention provides for determining the methylation status of 
CpG islands within samples of genomic DNA. 

[0045] In contrast to previous methods for determining 
methylation patterns, detection of the methylated nucleic 
acid is relatively rapid and is based on amplification- 
mediated displacement of specific oligonucleotide probes. In 
a preferred embodiment, amplification and detection, in fact, 
occur simultaneously as measured by fluorescence -based 
real-time quantitative PCR ("RT-PCR") using specific, dual- 
labeled TaqMan® oligonucleotide probes. The displace able 
probes can be specifically designed to distinguish between 
methylated and unmethylated CpG sites present in the 
original, unmodified nucleic acid sample. 

[0046] Like the technique of methylation-specific PCR 
("MSP"; U.S. Pat. No. 5,786,146), the present invention 
provides for significant advantages over previous PCR- 
based and other methods (e.g.. Southern analyses) used for 
determining methylation patterns. The present invention is 
substantially more sensitive than Southern analysis, and 
facilitates the detection of a low number (percentage) of 
methylated alleles in very small nucleic acid samples, as 
well as parafBn-embedded samples. Moreover, in the case of 
genomic DNA, analysis is not limited to DNA sequences 
recognized by methylation-sensitive restriction endonu- 
cleases, thus allowing for fine mapping of methylation 
patterns across broader CpG-rich regions. The present 
invention also eliminates the any false-positive results, due 
to incomplete digestion by methylation-sensitive restriction 
enzymes, inherent in previous PCR-based methylation 
methods. 

[0047] The present invention also offers significant advan- 
tages over MSP technology. It can be appHed as a quanti- 
tative process for measuring methylation amounts, and is 
substantially more rapid. One important advance over MSP 

technology is that the gel electrophoresis is not only a 
time-consuming manual task that limits high throughput 
capabilities, but the manipulation and opening of the PCR 
reaction tubes increases the chance of sample mis-identifi- 
cation and it greatly increases the chance of contaminating 
future PCR reactions with trace PCR products. The standard 
method of avoiding PCR contamination by uracil incorpo- 
ration and the use of Uracil DNA Glycosylase (AmpErase) 
is incompatible with bisulfite technology, due to the pres- 
ence of uracil in bisulfite-treated DNA. Therefore, the avoid- 
ance of PCR product contamination in a high-throughput 
appHcation with bisulfite-treated DNA is a greater technical 
challenge than for the amplification of unmodified DNA. 
The present invention does not require any post-PCR 
manipulation or processing. This not only greatly reduces 
the amount of labor involved in the analysis of bisulfite- 
treated DNA, but it also provides a means to avoid handling 
of PCR products that could contaminate future reactions. 

[0048] Two factors limit MSP to, at best, semi-quantitative 
applications. First, MSP methylation information is derived 
from the comparison of two separate PCR reactions (the 
methylated and the unmethylated versions). There are inher- 
ent difficulties in making kinetic comparisons of two differ- 
ent PCR reactions without a highly quantitative method of 
following the amplification reaction, such as Real-Time 
Quantitative PCR. The other problem relates to the fact that 
MSP amplification is provided for by, means of particular 
CpG-specific oligonucleotides; that is, by biased primers. 



us 2002/0086324 Al 



6 



Jul. 4, 2002 



Often, the DNA sequence covered by such primers contains 
more than one CpG dinucleotide with the consequence that 
the sequence ampHfied will represent only one of multiple 
potential sequence variants present, depending on the DNA 
methylation pattern in the original genomic DNA. For 
instance, if the forward primer is a 24-mer oligonucleotide 
that covers 3 CpGs, then 2^=8 different theoretical sequence 
permutations could arise in the genomic DNA following 
bisufite conversion within this 24- nucleotide sequence. If 
only a fully methylated and a fully unmethylated reaction is 
run, then only 2 out of the 8 possible methylation states are 
analyzed. 

[0049] The situation is further compUcated if the interme- 
diate methylation states are non-specifically amplified by the 
fully methylated or fully unmethylated primers. Accord- 
ingly, the MSP patent explicitly describes a non-quantitative 
technique based on the occurrence or non-occurrence of a 
PCR product in the fully methylated , versus fully unmethy- 
lated reaction, rather than a comparison of the kinetics of the 
two reactions. 

[0050] By contrast, one embodiment of the present inven- 
tion provides for the unbiased amplification of all possible 
methylation states using primers that do not cover any CpG 
sequences in the original, unmodified DNA sequence. To the 
extent that all methylation patterns are amplified equally, 
quantitative information about DNA methylation patterns 
can then be distilled from the resulting PCR pool by any 
technique capable of detecting sequence differences (e.g., by 
fluorescence -based PCR). 

[0051] Furthermore, the present invention is substantially 
faster than MSP. As indicated above, MSP relies on the 
occurrence or non-occurrence of a PCR product in the 
methylated, versus unmethylated reaction to determine the 
methylation status of a CpG sequence covered by a primer. 
Minimally, this requires performing agarose or polyacryla- 
mide gel electrophoretic analysis (see U.S. Pat. No. 5,786, 
146, FIGS. 2A-2E, and 3A-3E). Moreover, determining the 
methylation status of any CpG sites within a given MSP 
amplified region would require additional analyses such as: 
(a) restriction endonuclease analysis either before, or after 
(e.g., COBRA analysis; Xiong and l^siird. Nucleic Acids Res. 
25:2532-2534, 1997) nucleic acid modification and ampli- 
fication, provided that either the unmodified sequence region 
of interest contains methylation-sensitive sites, or that modi- 
fication (e.g., bisulfite) results in creating or destroying 
restriction sites; (b) single nucleotide primer extension reac- 
tions (Ms-SNuPE; Gonzalo and Jones, Nucleic Acids Res 25: 
2529-2531, 1997); or (c) DNA sequencing of the amplifi- 
cation products. Such additional analyses are not only sub- 
ject to error (incomplete restriction enzyme digestion), but 
also add substantial time and expense to the process of 
determining the CpG methylation status of, for example, 
samples of genomic DNA. 

[0052] By contrast, in a preferred embodiment of the 
present invention, amplification and detection occur simul- 
taneously as measured by fluorescence -based real-time 
quantitative PCR using specific, dual-labeled oligonucle- 
otide probes. In principle, the methylation status at any 
probe-specific sequence within an amplified region can be 
determined contemporaneously with amplification, with no 
requirement for subsequent manipulation or analysis. 

[0053] As disclosed by MSP inventors, "[t]he only tech- 
nique that can provide more direct analysis than MSP for 



most CpG sites within a defined region is genomic sequenc- 
ing.'' (U.S. Pat. No. 5,786,146 at 5,hne 15-17). The present 
invention provides, in fact, a method for the partial direct 
sequencing of modified CpG sites within a known (previ- 
ously sequenced) region of genomic DNA. Thus, a series of 
CpG-specific TaqMan® probes, each corresponding to a 
particular methylation site in a given amplified DNA region, 
are constructed. This series of probes are then utilized in 
parallel amplification reactions, using aliquots of a single, 
modified DNA sample, to simultaneously determine the 
complete methylation pattern present in the original unmodi- 
fied sample of genomic DNA. This is accomplished in a 
fraction of the time and expense required for direct sequenc- 
ing of the sample of genomic DNA, and are substantially 
more sensitive. Moreover, one embodiment of the present 
invention provides for a quantitative assessment of such a 
methylation pattern. 

[0054] The present invention has identified four process 
techniques and associated diagnostic kits, utilizing a methy- 
lation-dependent nucleic acid modifying agent (e.g., 
bisulfite), to both qualitatively and quantitatively determine 
CpG methylation status in nucleic acid samples (e.g., 
genomic DNA samples). The four processes are outlined in 
FIG. 3 and labeled at the bottom with the letters A through 
D. Overall, methylated- CpG sequence discrimination is 
designed to occur at the level of amplification, probe hybrid- 
ization or at both levels. For example, applications C and D 
utilize "biased'" primers that distinguish between modified 
unmethylated and methylated nucleic acid and provide 
methylated-CpG sequence discrimination at the PCR ampli- 
fication level. Process B uses "unbiased" primers (that do 
not cover CpG methylation sites), to provide for unbiased 
amplification of modified nucleic acid, but rather utilize 
probes that distinguish between modified unmethylated and 
methylated nucleic acid to provide for quantitative methy- 
lated-CpG sequence discrimination at the detection level 
(e.g., at the fluorescent (or luminescent) probe hybridization 
level only). Process A does not, in itself, provide for methy- 
lated-CpG sequence discrimination at either the amplifica- 
tion or detection levels, but supports and validates the other 
three applications by providing control reactions for input 
DNA. 

[0055] Process D. 

[0056] In a first embodiment (FIG. 3, Application D), the 
invention provides a method for qualitatively detecting a 
methylated CpG-containing nucleic acid, the method includ- 
ing: contacting a nucleic acid-containing sample with a 
modifying agent that modifies unmethylated cytosine to 
produce a converted nucleic acid; amplifying the converted 
nucleic acid by means of two oligonucleotide primers in the 
presence of a specific oligonucleotide hybridization probe, 
wherein both the primers and probe distinguish between 
modified, unmethylated and methylated nucleic acid; and 
detecting the "methylated" nucleic acid based on amplifi- 
cation-mediated probe displacement. 

[0057] The term "modifies" as used herein means the 
conversion of an unmethylated cytosine to another nucle- 
otide by the modifying agent, said conversion distinguishing 
unmethylated from methylated cytosine in the original 
nucleic acid sample. Preferably, the agent modifies unm- 
ethylated cytosine to uracil. Preferably, the agent used for 
modifying unmethylated cytosine is sodium bisulfite, how- 
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ever, other equivalent modifying agents that selectively 
modify unmethylated cytosine, but not methylated cytosine, 
can be substituted in the method of the invention. Sodium- 
bisulfite readily reacts with the 5,6-double bond of cytosine, 
but not with methylated cytosine, to produce a sulfonated 
c5d:osine intermediate that undergoes deamination under 
alkaline conditions to produce uracil (Example 1). Because 
Taq polymerase recognizes uracil as thymine and 5-meth- 
ylcytidine (m5C) as cytidine, the sequential combination of 
sodium bisulfite treatment and PCR amplification results in 
the ultimate conversion of unmethylated cytosine residues to 
thymine (C— >U^T) and methylated cytosine residues 
("mC") to cytosine (mC^mC^C). Thus, sodium-bisulfite 
treatment of genomic DNA creates methylation-dependent 
sequence differences by converting unmethylated cyotsines 
to uracil, and upon PCR the resultant product contains 
cytosine only at positions where methylated cytosine occurs 
in the unmodified nucleic acid. 

[0058] Oligonucleotide "primers," as used herein, means 
linear, single -stranded, oligomeric deoxyribonucleic or ribo- 
nucleic acid molecules capable of sequence -specific hybrid- 
ization (annealing) with complementary strands of modified 
or unmodified nucleic acid. As used herein, the specific 
primers are preferably DNA. The primers of the invention 
embrace oligonucleotides of appropriate sequence and suf- 
ficient length so as to provide for specific and efficient 
initiation of polymerization (primer extension) during the 
amplification process. As used in the inventive processes, 
oligonucleotide primers typically contain 12-30 nucleotides 
or more, although may contain fewer nucleotides. Prefer- 
ably, the primers contain from 18-30 nucleotides. The exact 
length will depend on multiple factors including temperature 
(during amplification), buffer, and nucleotide composition. 
Preferably, primers are single-stranded although double- 
stranded primers may be used if the strands are first sepa- 
rated. Primers may be prepared using any suitable method, 
such as conventional phosphotriester and phosphodiester 
methods or automated embodiments which are commonly 
known in the art. 

[0059] As used in the inventive embodiments herein, the 
specific primers are preferably designed to be substantially 
complementary to each strand of the genomic locus of 
interest. Typically, one primer is complementary to the 
negative, (-) strand of the locus (the "lower*' strand of a 
horizontally situated double -stranded DNA molecule) and 
the other is complementary to the positive (+) strand 
("upper" strand). As used in the embodiment of Application 
D, the primers are preferably designed to overlap potential 
sites of DNAmethylation (CpG nucleotides) and specifically 
distinguish modified unmethylated from methylated DNA. 
Preferably, this sequence discrimination is based upon the 
differential annealing temperatures of perfectly matched, 
versus mismatched oligonucleotides. In the embodiment of 
Application D, primers are typically designed to overlap 
from one to several CpG sequences. Preferably, they are 
designed to overlap from 1 to 5 CpG sequences, and most 
preferably from 1 to 4 CpG sequences. By contrast, in a 
quantitative embodiment of the invention, the primers do not 
overlap any CpG sequences. 

[0060] In the case of fully "unmethylated" (complemen- 
tary to modified unmethylated nucleic acid strands) primer 
sets, the anti-sense primers contain adenosine residues 
("As") in place of guanosine residues ("Gs") in the corre- 



sponding (-) strand sequence. These substituted As in the 
anti-sense primer will be complementary to the uracil and 
thymidine residues ("Us" and "Ts") in the corresponding (+) 
strand region resulting from bisulfite modification of unm- 
ethylated C residues ("Cs") and subsequent amplification. 
The sense primers, in this case, are preferably designed to be 
complementary to anti-sense primer extension products, and 
contain Ts in place of unmethylated Cs in the corresponding 
(+) strand sequence. These substituted Ts in the sense primer 
will be complementary to the As, incorporated in the anti- 
sense primer extension products at positions complementary 
to modified Cs (Us) in the original (+) strand. 

[0061] In the case of fuUy-methylated primers (comple- 
mentary to methylated CpG-containing nucleic acid 
strands), the anti-sense primers will not contain As in place 

of Gs in the corresponding (-) strand sequence that are 
complementary to methylated Cs (i.e., mCpG sequences) in 
the original (+) strand. Similarly, the sense primers in this 
case will not contain Ts in place of methylated Cs in the 
corresponding (+) strand mCpG sequences. However, Cs 
that are UP not in CpG sequences in regions covered by the 
fiilly-methylated primers, and are not methylated, will be 
represented in the fully-methylated primer set as described 
above for unmethylated primers. 

[0062] Preferably, as employed in the embodiment of 
Application D, the amplification process provides for ampli- 
fying bisulfite converted nucleic acid by means of two 
oligonucleotide primers in the presence of a specific oligo- 
nucleotide hybridization probe. Both the primers and probe 
distinguish between modified unmethylated and methylated 
nucleic acid. Moreover, detecting the "methylated" nucleic 
acid is based upon amplification-mediated probe fluores- 
cence. In one embodiment, the fluorescence is generated by 
probe degradation by 5' to 3' exonuclease activity of the 
polymerase enzyme. In another embodiment, the fluores- 
cence is generated by fluorescence energy transfer effects 
between two adjacent hybridizing probes (Lightcycler® 
technology) or between a hybridizing probe and a primer. In 
another embodiment, the fluorescence is generated by the 
primer itself (Sunrise® technology). Preferably, the ampli- 
fication process is an enzymatic chain reaction that uses the 
oligonucleotide primers to produce exponential quantities of 
amplification-product, from a target locus, relative to the 
number of reaction steps involved. 

[0063] As describe above, one member of a primer set is 
complementary to the (-) strand, while the other is comple- 
mentary to the (+) strand. The primers are chosen to bracket 
the area of interest to be amplified; that is, the "amplicon." 
Hybridization of the primers to denatured target nucleic acid 
followed by primer extension with a DNA polymerase and 
nucleotides, results in synthesis of new nucleic acid strands 
corresponding to the amplicon. Preferably, the DNA poly- 
merase is Taq polymerase, as commonly used in the art. 
Although equivalent polymerases with a 5' to 3' nuclease 
activity can be substituted. Because the new amplicon 
sequences are also templates for the primers and poly- 
merase, repeated cycles of denaturing, primer annealing, and 
extension results in exponential production of the amplicon. 
The product of the chain reaction is a discrete nucleic acid 
duplex, corresponding to the amplicon sequence, with ter- 
mini defined by the ends of the specific primers employed. 
Preferably the amplification method used is that of PCR 
(Mullis et al.. Cold Spring Harb. Symp. Quant. Biol. 51:263- 
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273; Gibbs, Anal Chem. 62:1202-1214, 1990), or more 
preferably, automated embodiments thereof which are com- 
monly known in the art. 

[0064] Preferably, methylation-dependent sequence dif- 
ferences are detected by methods based on fluorescence- 
based quantitative PGR (real-time quantitative PGR, Heid et 
al.. Genome Res. 6:986-994, 1996; Gibson et al.. Genome 
Res. 6:995-1001, 1996) (e.g., "TaqMan(g),"'"Lightcycler(E>,'' 
and "Sunrise®" technologies). For the TaqMan® and Light- 
cycler® technologies, the sequence discrimination can occur 
at either or both of two steps: (1) the amplification step, or 
(2) the fluorescence detection step. In the case of the 
"Sunrise©" technology, the amplification and fluorescent 
steps are the same. In the case of the FRET hybridization, 
probes format on the Lightcycler®, either or both of the 
FRET oligonucleotides can be used to distinguish the 
sequence difference. Most preferably the amplification pro- 
cess, as employed in all inventive embodiments herein, is 
that of fluorescence-based Real Time Quantitative PGR 
(Heid et al.. Genome Res. 6:986-994, 1996) employing a 
dual-labeled fluorescent oligonucleotide probe (TaqMan® 
PGR, using an ABI Prism 7700 Sequence Detection System, 
Perkin Elmer Applied Biosystems, Foster Gity, Galif.). 

[0065] The "TaqMan®*" PGR reaction uses a pair of 
amplification primers along with a nonextendible interro- 
gating oligonucleotide, called a TaqMan® probe, that is 
designed to a hybridize to a GG-rich sequence located 
between the forward and reverse (ie., sense and anti-sense) 
primers. The TaqMan® probe further comprises a fiuores- 
cent "reporter moiety" and a "quencher moiety" covalently 
bound to linker moieties (e.g., phosphoramidites) attached to 
nucleotides of the TaqMan® oligonucleotide. Examples of 
suitable reporter and quencher molecules are: the 5' fluo- 
rescent reporter dyes 6FAM ("FAM"; 2,7 dimeihoxy-4,5- 
dichloro-6-carboxy-fluorescein), and TET (6-carboxy4,7,2', 
7'-tetrachlorofluorescein); and the 3' quencher dye TAMRA 
(6-carboxytetramethylrhodamine) (Livak et al., PCR Meth- 
ods Appl. 4:357-362, 1995; Gibson et al.. Genome Res. 
6:995-1001; and 1996; Heid et al.. Genome Res. 6:986-994, 
1996). 

[0066] One process for designing appropriate TaqMan®) 
probes involves utilizing a software facilitating tool, such as 
"Primer Express" that can determine the variables of GpG 
island location within GG-rich sequences to provide for at 
least a 10° G. melting temperature difference (relative to the 
primer melting temperatures) due to either specific sequence 
(tighter bonding of GC, relative to AT base pairs), or to 
primer length. 

[0067] The TaqMan® probe may or may not cover known 
GpG methylation sites, depending on the particular inven- 
tive process used. Preferably, in the embodiment of Appli- 
cation D, the TaqMan® probe is designed to distinguish 
between modified unmethylated and methylated nucleic acid 
by overlapping from 1 to 5 GpG sequences. As described 
above for the fuUy unmethylated and fully methylated 
primer sets, TaqMan® probes may be designed to be 
complementary to either unmodified nucleic acid, or, by 
appropriate base substitutions, to bisulfite -modified 
sequences that were either fully unmethylated or fully 
methylated in the original, unmodified nucleic acid sample. 

[0068] Each oligonucleotide primer or probe in the Taq- 
Man® PGR reaction can span anywhere from zero to many 



different GpG dinucleotides that each can result in two 
different sequence variations following bisulfite treatment 
("^GpG, or UpG). For instance, if an oligonucleotide spans 
3 GpG dinucleotides, then the number of possible sequence 
variants arising in the genomic DNA is 2^=8 different 
sequences. If the forward and reverse primer each span 3 
CpGs and the probe oligonucleotide (or both oligonucle- 
otides together in the case of the FRET format) spans 
another 3, then the total number of sequence permutations 
becomes 8x8x8=512. In theory, one could design separate 
PGR reactions to quantitatively analyze the relative amounts 
of each of these 512 sequence variants. In practice, a 
substantial amount of qualitative methylation information 
can be derived from the analysis of a much smaller number 
of sequence variants. Thus, in its most simple form, the 
inventive process can be performed by designing reactions 
for the fully methylated and the fuUy unmethylated variants 
that represent the most extreme sequence variants in a 
hypothetical example (see FIG. 3, Application D). The ratio 
between these two reactions, or alternatively the ratio 
between the methylated reaction and a control reaction 
(FIG. 3, Application A), would provide a measure for the 
level of DNA methylation at this locus. A more detailed 
overview of the qualitative version is shown in FIG. 4. 

[0069] Detection of methylation in the embodiment of 
Application D, as in other embodiments herein, is based on 
amplification-mediated displacement of the probe. In theory, 
the process of probe displacement might be designed to 
leave the probe intact, or to result in probe digestion. 
Preferably, as used herein, displacement of the probe occurs 
by digestion of the probe during amplification. During the 
extension phase of the PGR cycle, the fluorescent hybrid- 
ization probe is cleaved by the 5' to 3' nucleolytic activity of 
the DNA polymerase. On cleavage of the probe, the reporter 
moiety emission is no longer transferred efficiently to the 
quenching moiety, resulting in an increase of the reporter 
moiety fluorescent-emission spectrum at 518 nm. The fluo- 
rescent intensity of the quenching moiety (e.g., TAMRA), 
changes very little over the course of the PGR amplification. 
Several factors my influence the efficiency of TaqMan® 
PCR reactions including: magnesium and salt concentra- 
tions; reaction conditions (time and temperature); primer 
sequences; and PGR target size (i.e., amplicon size) and 
composition. Optimization of these factors to produce the 
optimum fluorescence intensity for a given genomic locus is 
obvious to one skilled in the art of PGR, and preferred 
conditions are further illustrated in the "Examples'* herein. 
The amnplicon may range in size from 50 to 8,000 base 
pairs, or larger, but may be smaller. Typically, the amplicon 
is from 100 to 1000 base pairs, and preferably is from 100 
to 500 base pairs. Preferably, the reactions are monitored in 
real time by performing PCR amplification using 96-well 
optical trays and caps, and using a sequence detector (ABI 
Prism) to allow measurement of the fluorescent spectra of all 
96 wells of the thermal cycler continuously during the PCR 
amplification. Preferably, process D is run in combination 
with the process A (FIG. 3) to provide controls for the 
amount of input nucleic acid, and to normalize data from 
tray to tray. 

[0070] Application G. 

[0071] The inventive process can be modified to avoid 
sequence discrimination at the PGR product detection level. 
Thus, in an additional qualitative process embodiment (FIG. 
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3, Application C), just the primers are designed to cover 
CpG dinucleotides, and sequence discrimination occurs 
solely at the level of amplification. Preferably, the probe 
used in this embodiment is still a TaqMan© probe, but is 
designed so as not to overlap any CpG sequences present in 
the original, unmodified nucleic acid. The embodiment of 
Application C represents a high-throughput, fluorescence- 
based real-time version of MSP technology, wherein a 
substantial improvement has been attained by reducing the 
time required for detection of methylated CpG sequences. 
Preferably, the reactions are monitored in real time by 
performing PCR amplification using 96-well optical trays 
and caps, and using a sequence detector (ABI Prism) to 
allow measurement of the fluorescent spectra of all 96 wells 
of the thermal cylcer continuously during the PCR ampli- 
fication. Preferably, process C is run in combination with 
process A to provide controls for the amount of input nucleic 
acid, and to normalize data from tray to tray. 

[0072] AppHcation B. 

[0073] The inventive process can be also be modified to 
avoid sequence discrimination at the PCR amplification 
level (FIG. 3, A and B). In a quantitative process embodi- 
ment (FIG. 3, Application B), just the probe is designed to 
cover CpG dinucleotides, and sequence discrimination 
occurs solely at the level of probe hybridization. Preferably, 
TaqMan® probes are used. In this version, sequence variants 
resulting from the bisulfite conversion step are amplified 
with equal efficiency; as long as there is no inherent ampli- 
fication bias (Warnecke et al.. Nucleic Acids Res. 25:4422- 
4426, 1997). Design of separate probes for each of the 
different sequence variants associated with a particular 
methylation pattern (e.g., 2^=8 probes in the case of 3 CpGs) 
would allow a quantitative determination of the relative 
prevalence of each sequence permutation in the mixed pool 
of PCR products. Preferably, the reactions are monitored in 
real time by performing PCR amplification using 96-well 
optical trays and caps, and using a sequence detector (ABI 
Prism) to allow measurement of the fluorescent spectra of all 
96 wells of the thermal cylcer continuously during the PCR 
amplification. Preferably, process B is run in combination 
with process A to provide controls for the amount of input 
nucleic acid, and to normalize data from tray to tray. 

[0074] AppHcation A. 

[0075] Process A (FIG. 3) does not, in itself, provide for 
methylated- CpG sequence discrimination at either the 
amplification or detection levels, but supports and validates 
the other three applications by providing control reactions 

for the amount of input DNA, and to normalize data from 
tray to tray. Thus, if neither the primers, nor the probe 
overlie any CpG dinucleotides, then the reaction represents 
unbiased amplification and measurement of amplification 
using fluorescent-based quantitative real-time PCR serves as 
a control for the amount of input DNA (FIG. 3, Application 
A). Preferably, process A not only lacks CpG dinucleotides 
in the primers and probe(s), but also does not contain any 
CpGs within the amplicon at all to avoid any differential 
effects of the bisulfite treatment on the amplification process. 
Preferably, the amplicon for process A is a region of DNA 
that is not frequently subject to copy number alterations, 
such as gene amplification or deletion. 

[0076] Results obtained with the qualitative version of the 
technology are described in the examples below. Dozens of 



human tumor samples have been analyzed using this tech- 
nology with excellent results. High- throughput using a Taq- 
Man® machine allowed performance of 1100 analyses in 
three days with one TaqMan® machine. 

EXAMPLE 1 

[0077] An initial experiment was performed to validate the 
inventive strategy for assessment of the methylation status 
of CpG islands in genomic DNA. This example shows a 
comparison between human sperm DNA (known to be 
highly unmethylated) and HCT116 DNA (from a human 
colorectal cell line, known to be highly methylated at many 
CpG sites) with respect to the methylation status of specific, 
hypermethylatable CpG islands in four different genes. 
COBRA (combined bisulfite restriction analysis; Xiong and 
Laird, Nucleic Acids Res. 25:2532-2534, 1997) was used as 
an independent measure of methylation status. 

[0078] DNA Isolation and Bisulfite Treatment. 

[0079] Briefly, genomic DNA was isolated from human 
sperm or HCT116 cells by the standard method of proteinase 
K digestion and phenol-chloroform extraction (Wolf et al.. 
Am. J Hum Genet. 51:478-485, 1992). The DNA was then 
treated with sodium bisulfite by initiaUy denaturing in 0.2 M 
NaOH, followed by addition of sodimn bisulfite and hyd- 
roquinone (to final concentrations of 3.1M, and 0.5M, 
respectively), incubation for 16 h. at 55** C, desalting (DNA 
Clean-Up System; Promega), desulfonation by 0.3M NaOH, 
and final ethanol precipitation. (Xiong and Laird, supra, 
citing Sadri and HoTnsby, Nucleic Acids Res. 24:5058-5059, 
1996; see also Frommer et al., Proc. Natl. Acad. Sci. USA 
89:1827-1831, 1992). After bisulfite treatment, the DNA 
was subjected either to COBRA analysis as previously 
described (Xiong and Laird, supra), or to the inventive 
amplification process using fluorescence-based real-time 
quantitative PCR (Held et al.. Genome Res. 6:986-994,- 
1996; Gibson et al., Genome-Res. 6:995-1001, 1996). 

[0080] COBRA and MsSNuPE Reactions. 

[0081] ESRl and APC genes were analyzed using 
COBRA (Combined Bisulfite Restriction Analysis). For 
COBRA analysis, methylation-dependent sequence differ- 
ences were introduced into the genomic DNA by standard 
bisulfite treatment according to the procedure described by 
Frommer et al (Proc. Natl Acad. Sci. USA 89:1827-1831, 
1992) (1 ug of salmon sperm DNA was added as a carrier 
before the genomic DNA was treated with sodium bisulfite). 
PCR amplification of the bisulfite converted DNA was 
performed using primers specific for the interested CpG 
islands, followed by restriction endonuclease digestion, gel 
electrophoresis, and detection using specific, labeled hybrid- 
ization probes. The forward and reverse primer sets used for 
the ESRl and APC genes are: TCCTAAAACTACACT- 
TACTCC [SEQ ID NO. 35], GGTTATTTGGAAAAAGAG- 
TATAG [SEQ ID NO. 36] (ESRl promoter); and 
AGAGAGAAGTAGTTGTGTTAAT [SEQ ID NO. 37], 
ACTACACCAATACAACCACAT [SEQ ID NO. 38] (APC 
promoter), respectively. PCR products of ESRl were 
digested by restriction endonuleases TaqI and BstUI, while 
the products from APC were digested by Taq I and SfaN I, 
to measure methylation of 3 CpG sies for APC and 4 CpG 
sites for ESRl. The digested PCR products were electro- 
phoresed on denaturing polyacrylamide gel and transferred 
to nylon membrane (Zetabind; American Bioanalytical) by 
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electroblotting. The membranes were hybridized by a 5'-end 
labeled oligonucleotide to visualize both digested and undi- 
gested DNA fragments of interest. The probes used are as 
follows: ESRl, AAACCAAAACTC [SEQ ID NO. 39]; and 
APC, CCCACACCCAACCAAT [SEQ ID NO. 40]. Quan- 
titation was performed with the Phosphoimager 445SI 
(Molecular Dynamics). Calculations were performed in 
Microsoft Excel. The level of DNA methylation at the 
investigated CpG sites was determined by calculating the 
percentage of the digested PGR fragments (Xiong and Laird, 
supra). 

[0082] MLHl and CDKN2A were analyzed using MsS- 
NuPE (Methylation-sensitive Single Nucleotide Primer 
Extension Assay), performed as described by Gonzalgo and 
Jones (Nucleic Acids Res. 25:2529-2531). PGR amplifica- 
tion of the bisulfite converted DNA was performed using 
primers specific for the interested CpG islands, and detec- 
tion was performed using additional specific primers (exten- 
sion probes). The forward and reverse primer sets used for 
the MLHl and CDKN2A genes are: GGAGGTTATAA- 
GAGTAGGGTTAA [SEQ ID NO. 41], CCAAC- 
CAATAAAAACAAAATACC [SEQ ID NO. 42] (MLHl 
promoter); GTAGGTGGGGAGGAGTTTAGTT [SEQ ID 
NO. 43], TCTAArAACCAACCAACCCCTCC [SEQ ID 
NO. 44] (CDKN2A promoter); and TTGTAT- 
TATTTTGTITTTTTTGGTAGG [SEQ ID NO. 45], 
CAACTTCTCAAArCATCAATCCTCAC [SEQ ID NO. 
46] (CDKN2AExon 2), respectively. The MsSNuPE exten- 
sion probes are located immediately 5' of the CpG to be 
analyzed, and the sequences are: TTTAGTAGAGG- 
TATATAAGTT [SEQ ID NO. 47], TAAGGGGAGAGGAG- 
GAGTTTGAGAAG [SEQ ID NO. 48] (MLHl promoter 
sites 1 and 2, respectively); TTTGAGGGATAGGGT [SEQ 
ID NO. 49], TTTTAGGGGTGTTATATT [SEQ ID NO. 50], 
TTTTTTTGTTTGGAAAGATAr [SEQ ID NO. 51] (pro- 
moter sites 1, 2, and 3, respectively); and GTTGGTGGT- 
GTTGTAT [SEQ ID NO. 52], AGGTTATGATGATGGG- 
TAG [SEQ ID NO. 53], TATTAGAGGTAGTAATTATGTT 
[SEQ ID NO. 54] Exon2 sites 1, 2, and 3, respectively). A 
pair of reactions was set up for each sample using either 
32p-dCTP or 32p-dTTP for single nucleotide extension. The 
extended MsSNuPE primers (probes) were separated by 
denaturing polyacrylamide gel. Quantitation was performed 
using the Phosphoimager. 

[0083] Inventive Methylation Analysis. 

[0084] Bisulfite -converted genomic DNA was amplified 
using locus-specific PGR primers flanking an oligonucle- 
otide probe with a 5' fluorescent reporter dye (6FAM) and a 
3' quencher dye (TAMRA) (Livak et al., VCR Methods Appl 
4:357-362, 1995) (primers and probes used for the methy- 
lation analyses are listed under "Genes, MethyLight Primers 
and Probe Sequences" herein, infra). In this example, the 
forward and reverse primers and the corresponding fluoro- 
genic probes were designed to discriminate between either 
fully methylated or fully unmethylated molecules of 
bisulfite-converted DNA (see discussion of primer design 
under "Detailed Description of the Invention, Process D" 
herein). Primers and a probe were also designed for a stretch 
of the MYODl gene (Myogenic Diflerentiation Gene), 
completely devoid of CpG dinucleotides as a control reac- 
tion for the amount of input DNA. Parallel reactions were 
performed using the inventive process with the methylated 
and unmethylated (D), or control oligos (A) on the bisulfite- 



treated sperm and HCT116 DNA samples. The values 
obtained for the methylated and unmethylated reactions 
were normalized to the values for the MYODl control 
reactions to give the ratios shown in Table 1 (below). 

[0085] In a TaqMan® protocol, the 5' to 3' nuclease 
activity of Taq DNA polymerase cleaved the probe and 
released the reporter, whose fluorescence was detected by 
the laser detector of the ABI Prism 7700 Sequence Detection 
System (Perkin-Elmer, Foster City, Calif.). After crossing a 
fluorescence detection threshold, the PGR amplification 
resulted in a fluorescent signal proportional to the amount of 
PGR product generated. Initial template quantity can be 
derived from the cycle number at which the fluorescent 
signal crosses a threshold in the exponential phase of the 
PGR reaction. Several reference samples were included on 
each assay plate to verify plate-to-plate consistency. Plates 
were normalized to each other using these reference 
samples. The PGR amplification was performed using a 
96-well optical tray and caps with a final reaction mixture of 
25 //I consisting of 600 nM each primer, 200 nM probe, 200 
//M each dATP, dCTP, dGTP, 400 //M dUTP, 5.5 mM MgCl 
ix TaqMan© Buffer A containing a reference dye, an^ 
bisulfite-converted DNA or unconverted DNA at the follow- 
ing conditions: 50° C. for 2 min, 95° C. for 10 min, followed 
by 40 cycles at 95° C. for 15 s and 60° C. for 1 min. 

[0086] Genes, MethyLight Primers and Probe Sequences. 

[0087] Four human genes were chosen for analysis: (1) 
APC (adenomatous polyposis coli) (Hiltunen et al.. Int. J. 
Cancer 70:644-648, 1997); (2) ESRl (estrogen receptor) 
(Issa et al.. Nature Genet. 7:536-40, 1994); (3) CDKN2A 
(pl6) (Ahuja, Cancer Res. 57:3370-3374, 1997); and (4) 
hMLHl (mismatch repair) (Herman et al., Proc. Natl Acad. 
Sci. USA. 95:6870-6875, 1998; Veigl et 2^..,Proc. Natl. Acad. 
Sci. USA. 95:8698-8702, 1998). These genes were chosen 
because they contain hypermethylatable CpG islands that 
are known to undergo de novo methylation in human col- 
orectal tissue in all normal and tumor samples. The human 
APC gene, for example, has been linked to the development 
of colorectal cancer, and CpG sites in the regulatory 
sequences of the gene are known to be distinctly more 
methylated in colon carcinomas, but not in premalignant 
adenomas; relative to normal colonic mucosa (Hiltunen et 
al., supra). The human ESR gene contains a CpG island at 
its 5' end, which becomes increasingly methylated in col- 
orectal mucosa with age and is heavily methylated in all 
human colorectal tumors analyzed (Issa et al., supra). Hyper- 
methylation of promoter-associated CpG islands of the 
GDKN2A (pi 6) gene has been found in 60% of colorectal 
cancers showing microsatellite instability (MI) due to 
defects in one of several base mismatch repair genes (Ahuja 
et al., supra). The mismatch repair gene MLHl plays a 
pivotal role in the development of sporadic cases of mis- 
match repair-deficient colorectal tumors (Thibodeau et al.. 
Science 260:816-819, 1993). It has been reported that 
MLHl can become transcriptionally silenced by DNA 
hypermethylation of its promoter region, leading to micro - 
satellite instabiUty (MSI) (Kane et al.. Cancer Res. 57:808- 
811, 1997; Ahuja et al., supra; Cunningham et al.. Cancer 
Res. 58:3455-3460, 1998; Herman et al., supra; Veigl et al., 
supra). 

[0088] Five sets of PGR primers and probes, designed 
specifically for bisulfite converted DNA sequences, were 
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used: (1) a set representing fully methylated and fully 
unmethylated DNA for the ESRl gene; (2) a fully methy- 
lated set for the MLHl gene; (3) a fully methylated and fully 
unmethylated set for the APC gene; and (4) a fully methy- 
lated and fully unmethylated set for the CDKN2A (pi 6) 
gene; and (5) an internal reference set for the MYODl gene 
to control for input DNA. The methylated and unmethylated 
primers and corresponding probes were designed to overlap 
1 to 5 potential CpG dinucleotides sites. The MYODl 
internal reference primers and probe were designed to cover 
a region of the MYODl gene completely devoid of any CpG 
dinucleotides to allow for unbiased PGR amplification of the 
genomic DNA, regardless of methylation status. As indi- 
cated above, parallel TaqMan® PGR reactions were per- 
formed with primers specific for the bisulfite-converted 
methylated and/or unmethylated gene sequences and with 
the MYODl reference primers. The primer and probe 
sequences are listed below. In all cases, the first primer listed 
is the forward PGR primer, the second is the TaqMan® 
probe, and the third is the reverse PGR primer. ESRl 
methylated (GGGGTTGGTTTTGGGATTG [SEQ ID NO. 
1], 6FAM 5'-GGATAAAAGGGAAGGAGGGGAGGA-3' 
TAMRA [SEQ ID NO. 2], GGGGAGAGGGGAAGTGTAA 
[SEQ ID NO. 3]); ESRl unmethylated (ACACATATC- 
GGAGCAAGACACAA[SEQ ID NO. 4], 6FAM 5'-CAAC- 
CCTACCCCAAAAACCTACAAArCCAA-3TAMRA 
[SEQ ID NO. 5], AGGAGTTGGTGGAGGGTGTTT [SEQ 
ID NO. 6]); MLHl methylated (GTATGGCCGCGT- 
CATCGT [SEQ ID NO. 7], 6FAM 5'-CGCGACGT- 
GAAACGCCACTACG-3' TAMRA [SEQ ID NO. 8], GGT- 
TATArATGGTTGGTAGTAITGGTGTTT [SEQ ID NO. 
9]); APG methylated (TTATATGTGGGTTAGGTGGGTr- 
TATAT [SEQ ID NO. 10], 6FAM 5'-GGGGTGGAAAAG- 
GGGGGGATTA-3' TAMRA [SEQ ID NO. 11], GAAG- 
GAAAAGGGTGGGGAT [SEQ ID NO. 12]); APG 
unmethylated (GGGTTGTGAGGGTATATTTTTGAGG 
[SEQ ID NO. 13], 6FAM 5'-GGGAGGGAAGGAGAGAAG- 
CTACCTAACC-3' TAMRA [SEQ ID NO. 14], GCAAG- 
GGACACTCCACAATAAA [SEQ ID NO. 15]); GDKN2A 
methylated (AACAACGTCCGCACCTCCT [SEQ ID NO. 
16], 6FAM 5'-AGCGGACCCCGAACCGCG-3TAMRA 
[SEQ ID NO. 17], TGGAATTTTGGGTTGATTGGTT 
[SEQ ID NO. 18]); GDKN2A unmethylated (GAAGGAAT- 
GAAGGAAAAATTGGAT [SEQ ID NO. 19], 6FAM 
5'-GGAGGAGGGAGTATGTAGTGTGGGGGTG-3'TAMRA 
[SEQ ID NO. 20], GGTGGATTGTGTGTGTTTGGTG 
[SEQ ID NO. 21]); and MYODl, (GGAAGTGGAAATG- 
GGGTGTGTAT [SEQ ID NO. 22], 6FAM 5'-TGGGTTG- 
GTATTGGTAAATGGAAGGTAAATAGGTGG-3' TAMRA 
[SEQ ID NO. 23], TGATTAATTTAGAITGGGTTTA- 
GAGAAGGA [SEQ ID NO. 24]). 

[0089] Tables 1 and 2 shows the results of the analysis of 
human sperm and HCT116 DNAs for methylation status of 
the CpG islands within the four genes; APC, ESRl, 
GDKN2A (pi 6), and hMLHl. The results are expressed as 
ratios between the methylated and unmethylated reactions 
and a control reaction (MYODl). Table 1 shows that sperm 
DNA yielded a positive ratio only with the "unmethylated" 
primers and probe; consistent with the known unmethylated 
status of sperm DNA, and consistent with the percent 
methylation values determined by COBRA analysis. That is, 
priming on the bisulfite -treated DNA occurred from regions 
that contained unmethylated cytosine in CpG sequences in 
the corresponding genomic DNA, and hence were deami- 
nated (converted to uracil) by bisulfite treatment. 



TABLE 1 



Technique 


COBRA or 
Ms-SNuPE 


Methylated 
Reaction* 


Unmethylated 
Reaction* 


GENE 








APC 


0% 


0 


49 


ESRl 


0% 


0 


62 


CDKN2A 


0%** 


0 


52 


hMLHl 


ND 


0 


ND 



*The values do not represent percentages, but values in an arbitrary unit 
that can be compared quantitatively between different DNA samples for 
the same reaction, after normalization with a control gene. 
** Based on Ms-SNuPE. 



[0090] Table 2 shows the results of an analysis of HCT116 
DNA for methylation status of the CpG islands within the 
four genes; APC, ESRl, CDKN2A(pl6), and hMLHl, The 
results are expressed as ratios between the methylation- 
specific reactions and a control reaction (MYODl). For the 
ESR gene, a positive ratio was obtained only with the 
"methylated"* primers and probe; consistent with the known 
methylated status of HCT116 DNA, and the COBRA analy- 
sis. For the CDKN2A gene, HCT116 DNA yielded positive 
ratios with both the "methylated*" and "unmethylated** prim- 
ers and probe; consistent with the known methylated status 
of HCT116 DNA, and with the COBRA analysis that 
indicates only partial methylation of this region of the gene. 
By contrast, the APC gene gave positive results only with 
the unmethylated reaction. However, this is entirely consis- 
tent with the COBRA analysis, and indicates that this APC 
gene region is unmethylated in HCT116 DNA. This may 
indicate that the methylation state of this particular APC 
gene regulatory region in the DNA from the HCT116 cell 
line is more like that of normal colonic mucosa or prema- 
lignant adenomas rather than that of colon carcinomas 
(known to be distinctly more methylated). 

TABLE 2 



COBRA and/or Methylated Unmethylated 
Technique Ms-SNuPE Reaction* Reaction* 



GENE 








APC 


2% 


0 


81 


ESRl 


99% 


36 


0 


CDKN2A 


38%** 


222 


26 


hMLHl 


ND 


0 


ND 



*The values do not represent percentages, but values in an arbitrary unit 
that can be compared quantitatively between dijBferent DNA samples for 
the same reaction, after normalization with a control gene. 
** Based on Ms-SNuPE. 



EXAMPLE 2 

[0091] This example is a comparison of the inventive 
process (A and D in FIG. 3) with an independent COBRA 
method (See ''Methods,'" above) to determine the methyla- 
tion status of a CpG island associated with the estrogen 
receptor (ESRl) gene in the human colorectal cell line 
HCT116 and in human sperm DNA. This CpG island has 
been reported to be highly methylated in HCT116 and 
unmethylated in human sperm DNA (Xiong and Laird, 
supra; Issa ct al., supra). The COBRA analysis, is described 
above. Two TaqI sites within this CpG island confirmed this. 
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showing a lack of methylation in the sperm DNAand nearly 
complete methylation in HCT116 DNA (FIG. 5A). Addi- 
tionally, results using bisulfite-treated and untreated DNA 
were compared. 

[0092] For an analysis, fully "methylated" and fuUy "unm- 
ethylated" ESRl, and control MYODl primers and probes 
were designed as described above under "Example 1." Three 

separate reactions using either the "methylated,''"unmethy- 
lated" or control oligos on both sperm and HCT116 DNA 
were performed. As in Example 1, above, the values 
obtained for the methylated and unmethylated reactions 
were normalized to the values for the MYODl control 
reactions to give the ratios shown in FIG. 5B. Sperm DNA 
yielded a positive ratio only with the unmethylated primers 
and probe, consistent with its unmethylated status. In con- 
trast, HCT116 DNA, with predominantly methylated ESRl 
alleles, generated a positive ratio only in the methylated 
reaction (FIG. 5B). Both the sperm and HCT116 DNA 
yielded positive values in the MYODl reactions, indicating 
that there was sufficient input DNA for each sample. As 
expected, the non-bisulfite converted DNA with either the 
methylated or unmethylated oligonucleotides (FIG. 5B) was 
not amplified. These results are consistent with the COBRA 
findings (FIG. 5A), suggesting that the inventive assay can 
discriminate between the methylated and unmethylated alle- 
les of the ESRl gene. In addition, the reactions are specific 
to bisulfite -converted DNA, which precludes the generation 
of false positive results due to incomplete bisulfite conver- 
sion. 

EXAMPLE 3 

[0093] This example determined specificity of the inven- 
tive primers and probes. FIG. 6 shows a test of all possible 
combinations of primers and probes to further examine the 
specificity of the methylated and unmethylated oligonucle- 
otides on DNAs of known methylation status. Eight different 
combinations of the ESRl "methylated" and "unmethy- 
lated" forward and reverse primers and probe (as described 
above in "Example 1") were tested in different combinations 
in inventive assays on sperm and HCT116 DNA in dupli- 
cate. The assays were performed as described above in 
Example 1. Panel A (FIG. 6) shows the nomenclature used 
for the combinations of the ESRl oligos. "U" refers to the 
oligo sequence that anneals with bisulfite-converted unm- 
ethylated DNA, while "M" refers to the methylated version. 
Position 1 indicates the forward PGR primer, position 2 the 
probe, and position 3 the reverse primer. The combinations 
used for the eight reactions are shown below each pair of 
bars, representing duplicate experiments. The results are 
expressed as ratios between the ESRl values and the 
MYODl control values. Panel B represents an analysis of 
human sperm DNA. Panel C represents an analysis of DNA 
obtained from the human colorectal cancer cell line 
HCT116. 

[0094] Only the fully unmethylated (reaction 1) or fully 
methylated combinations (reaction 8) resulted in a positive 
reaction for the sperm and HCT116, respectively. The other 
combinations were negative, indicating that the PGR con- 
ditions do not allow for weak annealing of the mismatched 
oligonucleotides. This selectivity indicates that the inventive 
process can discriminate between fully methylated or unm- 
ethylated alleles with a high degree of specificity. 



EXAMPLE 4 

[0095] This example shows that the inventive process is 
reproducible. FIG. 7 illustrates an analysis of the methyla- 
tion status of the ESRl locus in DNA samples derived from 
a primary colorectal adenocarcinoma and matched normal 
mucosa derived from the same patient (samples ION and 
lOT in FIG. 8) in order to study a heterogeneous population 
of methylated and unmethylated alleles. The colorectal 
tissue samples were collected as described in Example 5, 
below. In addition, the reproducibility of the inventive 
process was tested by performing eight independent reac- 
tions for each assay. The results for the ESRl reactions and 
for the MYODl control reaction represent raw absolute 
values obtained for these reactions, rather than ratios, so that 
the standard errors of the individual reactions can be evalu- 
ated. The values have been plate -normalized, but not cor- 
rected for input DNA. The bars indicate the mean values 
obtained for the eight separate reactions. The error bars 
represent the standard error of the mean. 

[0096] FIG. 7 shows that the mean value for the methy- 
lated reaction was higher in the tumor compared to the 
normal tissue whereas the unmethylated reaction showed the 
opposite result. The standard errors observed for the eight 
independent measurements were relatively modest and were 
comparable to those reported for other studies utilizing 
TaqMan® technology (Fink et al.. Nature Med. 4:1329- 
1333, 1998). Some of the variability of the inventive process 
may have been a result of stochastic PGR amplification 
(PGR bias), which can occur at low template concentrations. 
(Warnecke et al.. Nucleic Acids Res. 25:4422-4426,1997). In 
summary, these results indicate that the inventive process 
can yield reproducible results for complex, heterogeneous 
DNA samples. 

EXAMPLE 5 

[0097] This example shows a comparison of MLHI 
Expression, microsatellite instability and MLHI promoter 
methylation in 25 matched-paired human colorectal 
samples. The main benefit of the inventive process is the 
ability to rapidly screen human tumors for the methylation 
state of a particular locus. In addition, the analysis of DNA 
methylation as a surrogate marker for gene expression is a 
novel way to obtain clinically useful information about 
tumors. We tested the utility of the inventive process by 
interrogating the methylation status of the MLHI promoter. 
The mismatch repair gene MLHI plays a pivotal role in the 
development of sporadic cases of mismatch repair-deficient 
colorectal tumors (Thibodeau et al.. Science 260:816-819, 
1993). It has been reported that MLHI can become tran- 
scriptionally silenced by DNA hypermethylation of its pro- 
moter region, leading to microsatellite instability (MSI) 
(Kane et al., Cancer Res 57:808-811, 1997;,Ahuja et al.. 
Cancer Res 51 -3310-331 A, 1997; Cunningham et al.. Cancer 
Res. 58:3455-3460, 1998; Herman, J. G. et al., Proc. Natl. 
Acad. Sci. USA 95:6870-6875, 1998; Veigl et al., Prac. Natl. 
Acad. Sci. USA 95:8698-8702, 1998). 

[0098] Using the high-throughput inventive process, as 
described in Example 1 Application D, 50 samples consist- 
ing of 25 matched pairs of human colorectal adenocarcino- 
mas and normal mucosa were analyzed for the methylation 
status of the MLHI GpG island. Quantitative RT-PGR 
(TaqMan®) analyses of the expression levels of MLHI 
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normalized to ACTB (p-actin) was investigated. Further- 
more, the microsatellite instability (MSI) status of each 
sample was analyzed by PCR of the BAT25 and BAT26 loci 
(Parsons et al.. Cancer Res. 55:5548-5550, 1995). The 
twenty -five paired tumor and normal mucosal tissue samples 
were obtained from 25 patients with primary colorectal 
adenocarcinoma. The patients comprised 16 males and 9 
females, ranging in age from 39-88 years, with a mean age 
of 68.8. The mucosal distance from tumor to normal speci- 
mens was between 10 and 20 cm. Approximately 2 grams of 
the surgically removed tissue was immediately frozen in 
liquid nitrogen and stored at -80° C. until RNA and DNA 
isolation. 

[0099] Quantitative RT-PCR and Microsatellite Instability 
Analysis. 

[0100] The quantitation of mRNA levels was carried out 
using real-time fluorescence detection. The TaqMan® reac- 
tions were performed as described above for the assay, but 
with the addition of lU AmpErase uracil N-glycosylase). 
After RNA isolation, cDNAwas prepared from each sample 
as previously described (Bender et al.. Cancer Res 58:95- 
101, 1998). Briefly, RNA was isolated by lysing tissue in 
buffer containing quanidine isothiocyanate (4M), N-lauryl 
sarcosine (0.5%), sodium citrate (25 mM), and 2-mercap- 
toethanol (O.IM), followed by standard phenol-chloroform 
extraction, and precipitation in 50% isopropanol/50% lysis 
buffer. To prepare cDNA, RNA samples were reverse - 
transcribed using random hexamers, deoxynucleotide triph- 
osphates, and Superscript II® reverse transcriptase (Life 
Technologies, Inc., Palo Alto, CaHf.). The resulting cDNA 
was then amplified with primers specific for MLHl and 
ACTB. Contamination of the RNA samples by genomic 
DNA was excluded by analysis of all RNA samples without 
prior cDNA conversion. Relative gene expression was deter- 
mined based on the threshold cycles (number of PCR cycles 
required for detection with-a specific probe) of the MLHl 
gene and of the internal reference gene ACTB. The forward 
primer, probe and reverse primer sequences of the ACTB 
and MLHl genes are: ACTB (TGAGCGCGGCTA- 
CAGCTT [SEQ ID NO. 25], 6FAM5'-ACCACCACGGC- 
CGAGCGG-3'TAMRA [SEQ ID NO. 26], CCTTAATGT- 
CACACACGATT [SEQ ID NO. 27]); and MLHl 
(GTTCTCCGGGAGATGTTGCATA [SEQ ID NO. 28], 
6FAM5'-CCTCAGTGGGCCTrGGCACAGC-3'TAMRA 
[SEQ ID NO. 29], TGGTGGTGTTGAGAAGG- 
TATAACTTG [SEQ ID NO. 30]). 

[0101] Alterations of numerous polyadenine ("pA") 
sequences, distributed widely throughout the genome, is a 
useful characteristic to define tumors with microsatellite 
instability (lonov et al.. Nature 363-558-561, 1993). Mic- 
rosatellite instability (MSI) was determined by PCR and 
sequence analysis of the BAT25 (25 -base pair pA tract from 
an intron of the c-kit oncogene) and BAT26 (26-base pair pA 
tract from an intron of the mismatch repair gene hMSH2) 
loci as previously described (Parsons et al.. Cancer Res 
55:5548-5550, 1995). Briefly, segments the BAT25 and 
BAT26 loci were amplified for 30 cycles using one ^^P- 
labeled primer and one unlabeled primer for each locus. 
Reactions were resolved on urea-formamide gels and 
exposed to film. The forward and reverse primers that were 
used for the amplification of BAr25 and BAT26 were: 
BAT25 (TCGCCTCCAAGAATGTAAGT [SEQ ID NO. 
31], TCTGCATTTTAACTATGGCTC [SEQ ID NO. 32]); 



and BAT26 (TGACTACTTTTGACTTCAGCC [SEQ ID 
NO. 33], AACCATTCAACATTTTTAACCC [SEQ ID NO. 
34]). 

[0102] FIG. 8 shows the correlation between MLHl gene 
expression, MSI status and promoter methylation of MLHl, 
as determined by the inventive process. The upper chart 
shows the MLHl expression levels measured by quantita- 
tive, real time RT-PCR (TaqMan®) in matched normal 
(hatched bars) and tumor (solid black bars) colorectal 
samples. The expression levels are displayed as a ratio 
between MLHl and ACTB measurements. MicrosateUite 
instability status (MSI) is indicated by the circles located 
between the two charts. A black circle denotes MSI posi- 
tivity, while an open circle indicates that the sample is MSI 
negative, as determined by analysis of the BAT25 and 
BAT26 loci. The lower chart shows the methylation status of 
the MLHl locus as determined by inventive process. The 
methylation levels are represented as the ratio between the 
MLHl methylated reaction and the MYODl reaction. 

[0103] Four colorectal tumors had significantly elevated 
methylation levels compared to the corresponding normal 
tissue. One of these (tumor 17) exhibited a particularly high 
degree of MLHl methylation, as scored by the inventive 
process. Tumor 17 was the only sample that was both MSI 
positive (black circle) and showed transcriptional sflencing 
of MLHl. The remaining methylated tumors expressed 
MLHl at modest levels and were MSI negative (white 
circle). These results show that MLHl was biallelicaUy 
methylated in tumor 17, resulting in epigenetic silencing and 
consequent microsatellite instability, whereas the other 
tumors showed lesser degrees of MLHl promoter hyperm- 
ethylation and could have just one methylated allele, allow- 
ing expression from the unaltered allele. Accordingly, the 
inventive process was capable of rapidly generating signifi- 
cant biological information, such as promoter CpG island 
hypermethylation in human tumors, which is associated with 
the transcriptional silencing of genes relevant to the cancer 
process. 

[0104] COBRA and MsSNuPE Reactions. 

[0105] ESRl and ABC genes were analyzed using 
COBRA (Combined Bisulfite Restriction Analysis). For 
COBRA analysis, methylation-dependent sequence differ- 
ences were introduced into the genomic DNA by standard 
bisulfite treatment according to the procedure described by 
Frommer et al (Proc. Natl Acad. Sci. USA 89:1827-1831, 
1992) (1 ug of salmon sperm DNA was added as a carrier 
before the genomic DNA was treated with sodium bisulfite). 
PCR amplification of the bisulfite converted DNA was 
performed using primers specific for the interested CpG 
islands, followed by restriction endonuclease digestion, gel 
electrophoresis, and detection using specific, labeled hybrid- 
ization probes. The forward and reverse primer sets used for 
the ESRl and APC genes are: TCCTAAAACTACACT- 
TACTCC [SEQ ID NO. 35], GGTTATTTGGAAAAAGAG- 
TATAG [SEQ ID NO. 36] (ESRl promoter); and 
AGAGAGAAGTAGTTGTGTTAAT [SEQ ID NO. 37], 
ACTACACCAATACAACCACAT [SEQ ID NO. 38] (APC 
promoter), respectively. PCR products of ESRl were 
digested by restriction endonuleases TaqI and BstUI, while 
the products from APC were digested by Taq I and SfaN I, 
to measure methylation of 3 CpG sies for APC and 4 CpG 
sites for ESRl. The digested PCR products were electro- 
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phoresed on denaturing polyacrylamide gel and transferred 
to nylon membrane (Zetabind; American Bioanalytical) by 
electroblotting. The membranes were hybridized by a 5'-end 
labeled oligonucleotide to visualize both digested and undi- 
gested DNA fragments of interest. The probes used are as 
follows: ESRl, AAACCAAAACTC [SEQ ID NO. 39]; and 
APC, CCCACACCCAACCAAT [SEQ ID NO. 40]. Quan- 
titation was performed with the Phosphoimager 445SI 
(Molecular Dynamics). Calculations were performed in 
Microsoft Excel. The level of DNA methylation at the 
investigated CpG sites was determined by calculating the 
percentage of the digested PGR fragments (Xiong and Laird, 
supra). 

[0106] A MLHl and CDKN2A were analyzed using MsS- 
NuPE (Methylation-sensitive Single Nucleotide Primer 

Extension Assay), performed as described by Gonzalgo and 
Jones (Nucleic Acids Res. 25:2529-2531). PGR ampHfica- 
tion of the bisulfite converted DNA was performed using 



primers specific for the interested GpG islands, and detec- 
tion was performed using additional specific primers (exten- 
sion probes). The forward and reverse primer sets used for 
the MLHl and CDKN2A genes are: GGAGGTTATAA- 
GAGTAGGGTTAA [SEQ ID NO. 41], CCAAC- 
CAATAAAAACAAAAATACC [SEQ ID NO. 42] (MLHl 
promoter); GTAGGTGGGGAGGAGTTTAGTT [SEQ ID 
NO. 43], TCTAATAACCAACCAACCCCTCC [SEQ ID 
NO. 44] (CDKN2A promoter); and TTGTAT- 
TATTTTGTTTTTrTGGTAGG [SEQ ID NO. 45], CAACT- 
TCTCAAATCATCAATCCTCAC [SEQ ID NO. 46] 
(CDKN2A Exon 2), respectively. The MsSNuPE extension 
probes are located immediately 5' of the CpG to be analyzed, 
and the sequences are: TTTAGTAGAGGTATATAAGTT 
[SEQ ID NO. 47], TAAGGGGAGAGGAG- 
GAGTTTGAGAAG [SEQ ID NO. 48] (MLHl promoter 
sites 1 and 2, respectively), TTTGAGGGATAGGGT [SEQ 
ID NO. 49], TTrTAGGGGTGTTATATT [SEQ ID 



SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(iii) NUMBER OF SEQUENCES: 5 4 



(2) INFORMATION FOR SEQ ID NO : 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1: 
GGCGTTCGTT TTGGGATTG 19 



(2) INFORMATION FOR SEQ ID NO : 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 in ea r 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL; No 
( ix ) FEATURE : 

(A) NAME/KEY: 5' substitution with fluorescent reporter dye 
6FAM ( 2 , 7-dimethoxy-4 , S-dichloro-e-carboxy-fluorescein- 
phosporamidite-cytosine ) ; 3 ' substitution with quencher dye 
TAMRA ( 6-carboxytetramethylrhodamine ) . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2: 

CGATAAAACC GAACGACCCG ACGA 24 



(2) INFORMATION FOR SEQ ID NO : 3: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 19 base pairs 
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-continued 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3: 
GCCGACACGC GAACTCTAA 19 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4: 
ACACATATCC CACCAACACA CAA 2 3 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(ix) FEATURE: 

(A) NAME/KEY: 5' substitution with fluorescent reporter dye 
6FAM (2 , 7-dimethoxy-4 , 5-dichloro-6-carboxy-fluorescein- 
phosporamidite-cytosine) ; 3 ' substitution with quencher 
dye TAMRA ( 6-carboxytetramethylrhodamine ) . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

CAACCCTACC CCAAAAACCT ACAAATCCAA 3 0 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 1 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6: 
AGGAGTTGGT GGAGGGTGTT T 21 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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-continued 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7: 
CTATCGCCGC CTCATCGT 18 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
( ix ) FEATURE ; 

(A) NAME/KEY: 5' substitution with fluorescent reporter dye 
6FAM ( 2 , 7-dimethoxy-4 , 5-dichloro-6-carboxy-fluorescein- 
phosporamidite-cytosine ) ; 3 ' substitution with quencher dye 
TAMRA ( 6-carboxytetramethylrhodamine ) . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8: 

CGCGACGTCA AACGCCACTA CG 22 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL; No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9: 
CGTTATATAT CGTTCGTAGT ATTCGTGTTT 3 0 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO ; 10; 
TTATATGTCG GTTACGTGCG TTTATAT 2 7 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY; linear 
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-continued 



(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
( ix ) FEATURE : 

(A) NAME/KEY: 5' substitution with fluorescent reporter dye 
6FAM (2 , 7-dimethoxy-4 , 5-dichloro-6-carboxy-fluorescein- 
phosporainidite-cytosine) ; 3 ' substitution with quencher dye 
TAMRA ( 6-carboxytetramethylrhodamine) . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 11: 

CCCGTCGAAA ACCCGCCGAT TA 22 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA 
(iii) HYPOTHETICAL: No 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GAACCAAAAC GCTCCCCAT 19 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL; No 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO ; 13; 
GGGTTGTGAG GGTATATTTT TGAGG 25 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(ix) FEATURE: 

(A) NAME/KEY: 5' substitution with fluorescent reporter dye 
6FAM ( 2 , 7-dimethoxy-4 , 5-dichloro- 6-carboxy-fluorescein- 
phosporamidite-cytosine) ; 3 ' substitution with quencher dye 
TAMRA (6-carboxytetramethylrhodamine) . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO ; 14; 

CCCACCCAAC CACACAACCT ACCTAACC 2 8 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH; 22 base pairs 
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-continued 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 15: 
CCAACCCACA CTCCACAATA AA 22 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 16: 
AACAACGTCC GCACCTCCT 19 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(ix) FEATURE: 

(A) NAME/KEY: 5' substitution with fluorescent reporter dye 
6FAM (2 , 7-dimethoxy-4 , 5-dichloro-6-carboxy-fluorescein- 
phosporamidite-cytosine) ; 3 ' substitution with quencher dye 
TAMRA ( 6-carboxytetramethylrhodamine ) . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 17: 

ACCCGACCCC GAACCGCG 1 8 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 18 
TGGAATTTTC GGTTGATTGG TT 22 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 4 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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-continued 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 19: 
CAACCAATCA ACCAAAAATT COAT 2 4 



(2) INFORMATION FOR SEQ ID NO: 2 0 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
( ix ) FEATURE ; 

(A) NAME/KEY: 5' substitution with fluorescent reporter dye 
6FAM( 2,7 dimethoxy-4 , 5-dichloro-6-carboxy-fluorescein- 
phosporamidite-cytosine ) ; 3 ' substitution with quencher dye 
TAMRA ( 6-carboxytetramethylrhodamine ) . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

CCACCACCCA CTATCTACTC TCCCCCTC 2 8 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL; No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GGTGGATTGT GTGTGTTTGG TG 22 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 2 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO; 22; 
CCAACTCCAA ATCCCCTCTC TAT 2 3 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY; linear 
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(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
( ix ) FEATURE : 

(A) NAME/KEY: 5' substitution with fluorescent reporter dye 
6FAM (2 , 7-dimethoxy-4 , 5-dichloro-6-carboxy-fluorescein- 
phosporainidite-cytosine) ; 3 ' substitution with quencher dye 
TAMRA ( 6-carboxytetramethylrhodamine) . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

TCCCTTCCTA TTCCTAAATC CAACCTAAAT ACCTCC 3 6 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA 

(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
TGATTAATTT AGATTGGGTT TAGAGAAGGA 3 0 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO; 25; 
TGAGCGCGGC TACAGCTT 18 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(ix) FEATURE: 

(A) NAME/KEY: 5' substitution with fluorescent reporter dye 
6FAM ( 2 , 7-dimethoxy-4 , 5-dichloro- 6-carboxy-fluorescein- 
phosporamidite-cytosine) ; 3 ' substitution with quencher dye 
TAMRA (6-carboxytetramethylrhodamine) . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

ACCACCACGG CCGAGCGG 18 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH; 2 0 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
CCTTAATGTC ACACACGATT 2 0 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
GTTCTCCGGG AGATGTTGCA TA 22 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(ix) FEATURE: 

(A) NAME/KEY: 5' substitution with fluorescent reporter dye 
6FAM (2 , 7-dimethoxy-4 , 5-dichloro-6-carboxy-fluorescein- 
phosporamidite-cytosine) ; 3 ' substitution with quencher dye 
TAMRA ( 6-carboxytetramethylrhodamine ) . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

CCTCAGTGGG CCTTGGCACA GC 22 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 30: 
TGGTGGTGTT GAGAAGGTAT AACTTG 2 6 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 

(X) PUBLICATION INFORMATION: 

(A) AUTHORS: Parsons, et al 

(B) TITLE: Microsatellite Instability and Mutations of the 
Transforming Grovrth Factor B Type II Receptor Gene in 
Colorectal Cancer 

(C) JOURNAL: Cancer Res. 

(D) VOLUME: 55 

(F) PAGES: 554 8-555 0 

(G) DATE; Ol-DEC-19 95 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 31: 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 

(X) PUBLICATION INFORMATION: 

(A) AUTHORS: Parsons, et al 

(B) TITLE: Microsatellite Instability and Mutations of the 
Transforming Grovrth Factor B Type II Receptor Gene in 
Colorectal Cancer 

(C) JOURNAL: Cancer Res. 

(D) VOLUME: 55 

(F) PAGES: 5548-5550 

(G) DATE: Ol-DEC-19 95 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
TCTGCATTTT AACTATGGCT C 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 

(X) PUBLICATION INFORMATION: 

(A) AUTHORS: Parsons, et al 

(B) TITLE: Microsatellite Instability and Mutations of the 
Transforming Gro\rth Factor B Type II Receptor Gene in 
Colorectal Cancer 

(C) JOURNAL: Cancer Res. 

(D) VOLUME: 55 

(F) PAGES: 5548-5550 

(G) DATE: Ol-DEC-19 95 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 33: 
TGACTACTTT TGACTTCAGC C 



(2) INFORMATION FOR SEQ ID NO: 34: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 

(X) PUBLICATION INFORMATION: 

(A) AUTHORS: Parsons, et al 

(B) TITLE: Microsatellite Instability and Mutations of the 
Transforming Grovrth Factor B Type II Receptor Gene in 
Colorectal Cancer 

(C) JOURNAL: Cancer Res. 

(D) VOLUME: 55 

(F) PAGES: 5548-5550 

(G) DATE: Ol-DEC-19 95 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 34: 
AACCATTCAA CATTTTTAAC CC 22 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 35: 
TCCTAAAACT ACACTTACTC C 21 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 36: 
GGTTATTTGG AAAAAGAGTA TAG 2 3 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 37: 



AGAGAGAAGT AGTTGTGTTA AT 



22 
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(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 38: 
ACTACACCAA TACAACCACA T 21 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 39: 
AAACCAAAAC TC 12 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 40: 
CCCACACCCA ACCAAT 16 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
GGAGGTTATA AGAGTAGGGT TAA 2 3 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 4 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL; No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 42: 
CCAACCAATA AAAACAAAAA TACC 2 4 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 43: 
GTAGGTGGGG AGGAGTTTAG TT 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 44: 
TCTAATAACC AACCAACCCC TCC 2 3 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 45: 
TTGTATTATT TTGTTTTTTT TGGTAGG 2 7 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 46: 



CAACTTCTCA AATCATCAAT CCTCAC 



26 
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(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 1 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 47: 
TTTAGTAGAG GTATATAAGT T 21 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 48: 
TAAGGGGAGA GGAGGAGTTT GAGAAG 2 6 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 49: 
TTTGAGGGAT AGGGT 15 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
TTTTAGGGGT GTTATATT 1 8 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
TTTTTTTGTT TGGAAAGATA T 21 



(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 52: 
GTTGGTGGTG TTGTAT 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
AGGTTATGAT GATGGGTAG 



(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: No 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
TATTAGAGGT AGTAATTATG TT 22 



We claim: 

1. A method for detecting cytosine methylation and 
methylated CpG islands within a genomic sample of DNA 
comprising: 

(a) contacting a genomic sample of DNA with a modify- 
ing agent that modifies unmethylated cytosine to pro- 
duce a converted nucleic acid; 

(b) amplifying the converted nucleic acid by means of two 
oligonucleotide primers in the presence or absence of 
one or a plurality of specific oHgonucleotide probes. 



wherein one or a plurality of oligonucleotide primers 
and/or the specific probe(s) are capable of distinguish- 
ing between unmethylated and methylated nucleic acid; 

and 

(c) detecting the methylated nucleic acid based on ampH- 
fication-mediated digestion of the probe. 

2. The method of claim 1 wherein the amplifying step is 
a polymerase chain reaction (PGR). 

3. The method of claim 1 wherein the modifying agent is 
bisulfite. 
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4. The method of claim 1 wherein the converted nucleic 
acid contains uracil in place of unmethylated cytosine resi- 
dues present in the unmodified nucleic acid-containing 
sample. 

5. The method of claim 1 wherein the probe further 
comprises one or a plurality of fluorescence label moieties. 

6. The method of claim 5 wherein the amplification and 
detection step comprises fluorescence -based quantitative 
PGR. 

7. A method for detecting a methylated CpG-containing 
nucleic acid comprising: 

(a) contacting a nucleic acid-containing sample with a 
modifying agent that modifies unmethylated cytosine 
to produce a converted nucleic acid; 

(b) amplifying the converted nucleic acid in the sample by 
means of oligonucleotide primers in the presence of a 
CpG-specific oHgonucleotide probe, wherein the CpG- 
specific probe, but not the primers, distinguish between 
modified unmethylated and methylated nucleic acid; 
and 

(c) detecting the methylated nucleic acid based upon an 
amplification-mediated displacement of the CpG-spe- 
cific probe. 

8. The method of claim 7 wherein the amplifying step 
comprises a polymerase chain reaction (PGR). 

9. The method of claim 7 wherein the modifying agent 
comprises bisulfite. 

10. The method of claim 7 wherein the converted nucleic 
acid contains uracil in place of unmethylated cytosine resi- 
dues present in the unmodified nucleic acid-containing 
sample. 

11. The method of claim 7 wherein the detection method 
is by means of a measurement of a fluorescence signal based 
on amplification-mediated displacement of the CpG-specific 
probe. 

12. The method of claim 7 wherein the amplification and 
detection method comprises fluorescence -based quantitative 
PGR. 

13. The method of claim 7 wherein methylation amounts 
in the nucleic acid sample are quantitatively determined 
based on reference to a control reaction for amount of input 
nucleic acid. 

14. A method for detecting a methylated CpG-containing 
nucleic acid comprising: 

(a) contacting a nucleic acid-containing sample with a 
modifying agent that modifies unmethylated cytosine 
to produce a converted nucleic acid; 

(b) amplifying the converted nucleic acid in the sample by 
means of oHgonucleotide primers and in the presence 
of a CpG-specific oligonucleotide probe, wherein both 
the primers and the CpG-specific probe distinguish 
between modified unmethylated and methylated 
nucleic acid; and 



(c) detecting the methylated nucleic acid based on ampli- 
fication-mediated displacement of the CpG-specific 
probe. 

15. The method of claim 14 wherein the amplifying step 
comprises a polymerase chain reaction (PGR). 

16. The method of claim 14 wherein the modifying agent 
is bisulfite. 

17. The method of claim 14 wherein the converted nucleic 
acid contains uracil in place of unmethylated cytosine resi- 
dues present in the unmodified nucleic acid-containing 
sample. 

18. The method of claim 14 wherein the detection method 
comprises measuring a fluorescence signal based on ampli- 
fication-mediated displacement of the CpG-specific probe. 

19 The method of claim 14 wherein the amplification and 
detection method is fluorescence -based quantitative PCR. 

20. A methylation detection kit useful for the detection of 
a methylated CpG-containing nucleic acid comprising a 
carrier means being compartmentalized to receive in close 
confinement therein one or more containers comprising: 

(i) a first container containing a modifying agent that 
modifies unmethylated cytosine to produce a converted 
nucleic acid; 

(ii) a second container containing primers for amplifica- 
tion of the converted nucleic acid; 

(iii) a third container containing primers for the amplifi- 
cation of control unmodified nucleic acid; and 

(iv) a fourth container containing a specific oligonucle- 
otide probe the detection of which is based on ampH- 
fication-mediated displacement, 

wherein the primers and probe each may or may not 
distinguish between unmethylated and methylated 
nucleic acid. 

21. The kit of claim 20, wherein the modifying agent is 
bisulfite. 

22. The kit of claim 20 wherein the modifying agent 
converts cytosine residues to uracil residues. 

23. The kit of claim 20, wherein the specific oligonucle- 
otide probe is a CpG-specific oligonucleotide probe, and 
wherein the probe, but not the primers for amplification of 
the converted nucleic acid, distinguishes between modified 
unmethylated and methylated nucleic acid. 

24. The kit of claim 20, wherein the specific oligonucle- 
otide probe is a CpG-specific oligonucleotide probe, and 
wherein both the probe and the primers for amplification of 
the converted nucleic acid, distinguish between modified 
unmethylated and methylated nucleic acid. 

25. The kit of claim 20, wherein the probe further com- 
prises a fluorescent moiety linked to an oligonucleotide base 
directly or through a linker moiety. 

26. The kit of claim 20, wherein the probe is a specific, 
dual-labeled TaqMan® probe. 
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ABSTRACT 



Enzymatic synthesis of oligonucleotides is performed by the 
steps of: (a) combining a primer and a blocked nucleotide in 
the presence of a chain extending enzyme to form a primer- 
blocked nucleotide product containing the blocked nucle- 
otide coupled to the primer at its 3'-end; (b) removing the 
blocking group from the 3' end of the primer-blocked 
nucleotide product; and (c) repeating the cycle of steps (a) 
and (b), using the primer-nucleotide product of step (b) as 
the primer for step (a) in the next cycle, for sufficient cycles 
to form the oligonucleotide product. Cycles may optionally 
include the step of converting any unreacted blocked nucle- 
otide to an unreactive form which is substantially less active 
as a substrate for the chain extending enzyme. Cycles may 
also include the step of removing the blocking group from 
unreacted blocked nucleotide. This step is unnecessary, 
however, when the same nucleotide is added in two or more 
successive cycles. The synthetic cycles are preferably per- 
formed in a single vessel without intermediate purification 
of oligonucleotide product. 

29 Claims, 10 Drawing Sheets 
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UNCONTROLLED METHOD 

primer + nucleotide 

1^ Transferase enzyme incubation 
primer + (prlmer+1) + (primer+2) + (primer +3) +...etc. 

^ Purify (primer+ 1 ) product from side products 
(primer+ 1 ) 

Repeat cycle until oligonucleotide synthesis Is complete 

Fig. 1A 
BLOCKED METHOD 

primer + blocked-nucleotide 

1^ Transferase enzyme Incubation 
primer + (primer+1)-blocked + blcx^ked-nucleotide 

ilnactlvation/Removol of Transferase 
Removal of Blocking group (chemical or ervsymatlc) 

primer (primer+ 1 ) + nucleotide 

1^ Purify (primer+ 1 ) product from nucleotide and primer 

(primer+1) 

Repeat cycle until ollgonucleotfcle synthesis Is complete 



Fig. 1 B 
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(A) BASIC MODE 

primer i- App(d)Np (or ATP + 3'.5'-(d)NDP) 
1^ RNA LigasG Incubation, then optionally heat Inactivate 

prlmer-p(d)Np + AMP + App(d)Np 
1^ Alcaline Ptx3sphatase incutxition, tlien heat bKictivate 

piim^-p(d)N + Adenosine + App(d)N + PO4 

Repeat cycle until oligonucleotide synthesis is complete 

Fig. 2A 

(B) PREFERRED MODE 

primer + App(d)Np (or ATP + 3'.5"-(d)NDP) 

f?NA LIgase Incubation, heat Inactivafion optional 
primer-p(d)Np + AMP + App(d)Np 

Exonuciease + Nucleotide Pyrophosphatase incutxition 

(for example, phosphodiesterase I), 
then heat inactivate 

primer-p(d)Np + AMP + 3'.5'-(d)NDP 

I Allcaline Phosphatase incutxition. then heat inactivate 

prtmer-p(d)N + adenosine + (deoxyjnucieoslde + PO4 

Repeat cycle until oligonucleotide synthesis is complete 



Fig. 2B 
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(A) BASIC MODE 

primer + AppNp 

1^ RNA LIgase incubation, then heat Inactivate 
prinrier-pNp + AMP + AppNp 

1^ 3'-Phosphatase Incubation, then heat Inactivate 
pilmer-pN + PO4 + AMP + AppNp 

Repeat cycle until nucleotide substrate has been added to primer 
the desired number of times 

Fig. 3A 
(B) PREFERRED MODE 

primer 4- AppNp 

RNA LigasG Incubation, heat inactivation optional 
prinner-pNp + AMP + AppNp 

Exonuclease incubation, tl^en heat inactivate Exonuclease and RNA LIgase 

prinrier-pNp + AMP + AppNp 

1^ 3'-Phosphatase incubatbn, then heat Inactivate 

prinner-pN + PO< + AMP + AppNp 

Repeat cycle until nucleotide substrate has been added to primer 
the desired number of times 



Fig. 3B 
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AAA {Initial primer) 

Iadd AppCp 
One Pot without substrate reuse 

AAAC 

iadd AppUp 
One Pot with substrate reuse 

AAACU 

no substrate added 

One Pot without substrate reuse 

AAACUU 

Odd AppGp 

One Pot with substrate reuse 

^ t 

AAACUUG 

no substrate added 

One Pot with substrate reuse 

AAACUUGG 

no substrate added 

One Fbt without substrate reuse 

AAACUUGGG 



FIGURE 4 
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pyiophosphote 




App(d)Np 



pyrcpfiocphota adenine 




p(d)N,p{d)Np App(d)N.pld)Np 



FIGURE 5 
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primer-2 -phosphate + AppN-2\3'-cycllc phosphate 

^ Transfer RNA Llgase Incubation, then heat Inactivate 
primer-2'-pho5phate-pN-2',3'-cycllc phosphate + AppN-2',3'-cyclic phosphate + AMP 

i Alkaline Phosphatase + Nucleotide Pyrophosphatase Incubatbn, 
then heat inactivate 

prinner-pN-2'.3'-cyclic phosphate + N-2\3-cycllc phosphate + Adenosine 

1^ Cyclic Phosphodiesterase Incubtlon, then heat inactivate 

prinner-pN-2'-phosphate + 2-NMP + Adenosine 

Repeat cycle until oligonucleotide synthesis Is connptete 
Terminal 2'-phosphate can be removed with Alkaline Phosphatase 



FIGURE 6 
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prlmer-2',3'-cvclic phosphate + N-2'-phosphate. 3'-phospho-LG 
I HeLq/Eubacterla RNA LIgase Incubation, then heat Inactivate 
primer-pN-2'-pho5phate, 3'-phospho-LG + N-2'-phosphate. 3'-phosphoLG 

Phosphatase incubation, then hedt Inactivate and cycllze 
prlmer-pN-2',3'-cycllc phosphate + N-2'.3'-cycllc phosphate + LG 
Repeat cycle until oligonucleotide synthesis is complete 



FIGURE 7 
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5'-NUCLEOTlDASE 



FIGURE 8 
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FIGURE 9 
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METHOD FOR ENZYMATIC SYNTHESIS OF 
OLIGONUCLEOTIDES 

This application is a conlinuation-in-part of U.S. patent 
applications Ser. Nos. 08/161,224 filed Dec. 2, 1993, now 5 
U.S. Pat. No. 5,516,664 issued May 14, 1996; 08/100,671 
filed Jul. 30, 1993 and 07/995,791 filed Dec. 23, 1992 now 
U.S. Pat. No. 5,436,143 issued Jul. 25, 1995. 



BACKGROUND OF THE INVENTION 
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primer+dNTP ->(primer-dN)- 3'- OH+pyropho sphate 



20 



Synthetic oligonucleotides play a key role in molecular 
biology research, useful especially for DNA sequencing, 
DNA amplification, and hybridization. A novel "one pot" 
enzymatic method is described to replace both the obsolete 
enzymatic methods and the current phosphoramidite chemi- 
cal method. This new method promises increased through- 
put and reliability, ease of automation, and lower cost. 

Before the introduction of the phosphoramidite chemical 
method in 1983, enzymatic methods were used for the 
synthesis of oligonucleotides. Historically, two distinct 
enzymatic approaches have been employed as summarized 
in FIG. 1. These enzymatic methods have been abandoned, 
however, in favor of the superior phosphoramidite chemical ^ 
method. 

The first enzymatic approach is the **uncontrolled" 
method. As depicted in FIG. lA, a short oligonucleotide 
primer is incubated with the desired nucleotide and a nucle- 
otidyl transferase. At the end of the optimal incubation 30 
period, a mixture of oHgonucleotide products containing 
different numbers of bases added to the primer (i.e. primer, 
primer +1, primer +2 . . . ) is obtained. The desired product, 
the primer with one added base, is purified using either 
electrophoresis or chromatography. The process of enzyme 35 
incubation and oligonucleotide purification is repeated until 
the desired oligonucleotide is synthesized. Examples of the 
use of this approach are: (1) Polynucleotide Phosphorylase 
("PNP") and ADP, GDP, CDP, and UDP have been used to 
make oligodeoxyribonucleotides in accordance with the 40 
following reaction: 

priraer+rNDP->(primer-rN)-3'-OH+P04 

Shum et al. Nucleic Acids Res., 5(7): 2297-311 (1978), and 
(2) Terminal deoxynucleotidyl Transferase and the nucle- 
otides dATP, dGTP, dCTP, and dTTP have been used to 
make oligodeoxyribonucleotides in accordance with the 
following reaction: 
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Schott et al, Eur. J. Biochem, 143: 613-20 (1984). The flaws 
of the "uncontrolled" approach are the requirement for 
cumbersome manual purification of the primer-i-1 product 
after each coupling cycle, poor yields of the desired 55 
primer-i-1 product, and inability to automate. 

The second enzymatic approach is the "blocked" method, 
shown in FIG. IB. The nucleotide used in the extension step 
is blocked in some manner to prevent the nucleotidyl 
transferase fi*om adding additional nucleotides to the oligo- 60 
nucleotide primer. After the extension step, the oligonucle- 
otide product is separated from the enzyme and nucleotide, 
and the blocking group is removed by altering the chemical 
conditions or by the use of a second enzyme. The oligo- 
nucleotide product is now ready for the next extension 65 
reaction. Examples of this approach are: (1) PNP and 
NDP-2'-acetal blocked nucleotides have been used to make 



oligoribonucleotides. The acetal blocking group is removed 
under acidic conditions (Gilham et al. Nature, 233: 551-3 
(1971) and U.S. Pat. No. 3,850,749), (2) RNAligase and the 
blocked nucleotide App(d)Np (or ATP-{-3',5'-(d)NDP) have 
been used to make oligoribonucleotides and ohgodeoxyri- 
bonucleotides. The 3 '-phosphate blocking group is removed 
enzymatically with a phosphatase such as alkaline phos- 
phatase (T E. England et al, Biochemistry, (1978), 17(11), 
2069-81; D. M. Hinton et al, Nucleic Acids Research, 
(1982), 10(6), 1877-94). 

The advantage of the "blocked" method over the "uncon- 
trolled" method is that only one nucleotide can be added to 
the primer. Unfortunately, the "blocked" method has several 
flaws which led to its abandonment in favor of the chemical 
method. The "blocked method", like the "uncontrolled" 
method, requires the purification of the oligonucleotide 
product from the reaction components after each coupling 
cycle. 

In the first approach, using PNP, the oligonucleotide is 
exposed to acid to remove the acid-labile acetal blocking 
group. Oligonucleotide product must be purified and redis- 
solved in fresh buffer in preparation for the next polymer- 
ization reaction for two reasons: (1) PNP requires near 
neutral pH conditions whereas acetal removal requires 
approximately pH 1; and (2) the product of the polymeriza- 
tion reaction, PO4, must be removed or it will cause phos- 
phorolysis of the oligoribonucleotide catalyzed by PNP. 

In the second approach, using RNA ligase, the art teaches 
that oHgonucleotide product needs to be purified after each 
cycle because the dinucleotide App(d)N, formed by phos- 
phatase treatment of App(d)Np, is still a suitable substrate 
for RNA ligase and must be completely removed prior to 
addition of RNA ligase in the next cycle. England et al., 
Proc, Natl Acad, Sci. USA, 74(11): 4839--^2 (1977). Hinton 
et al. emphasize the importance of purifying oligonucleotide 
product after each cycle by stating: 'This elution profile [a 
DEAE-sephadex chromatogram of oligodeoxyribonucle- 
otide product] also demonstrates the absence of either sig- 
nificant contaminating products arising from nucleases or of 
the reaction intermediate, A-5'pp5'-dUp. The absence of 
such substances is critical if this general methodology is to 
be useful for synthesis." Hinton et al. Nucleic Acids 
Research, 10(6):1877-94 (1982). The art also teaches that 
nucleoside and phosphate by-products generated by phos- 
phatase incubation of the RNA Ligase reaction mixture 
substantially inhibit RNA Ligase activity and must be 
removed prior to subsequent RNA ligation steps in order to 
work usefiilly. Middleton et al.. Anal Biochem., 
144:110-117 (1985). 

Two modifications have been devised for the "blocked" 
method to improve the oHgonucleotide product yield and to 
speed required oHgonucleotide product purification after 
each coupHng cycle. The first modification was .the use of 
a branched synthetic approach (Oligonucleotide Synthesis: a 
practical approach, M. J. Gait editor, (1985), pp. 185-97, 
IRL Press). This approach improved the yield of final 
oligonucleotide product, but intermediate purification of 
oHgonucleotide after each coupUng cycle was still required. 
The second modification was the covalent attachment of the 
primer chain to a soHd phase support (A. V. Mudrakovskaia 
et al, Bioorg. Khim, (1991), 17(6), 819-22). This allows the 
oligonucleotide to be purified from aU reaction components 
simply by washing the soHd phase support colunm. How- 
ever, product yields are still low, and primer chains which do 
not couple during a cycle are not removed and are carried 
over to the next coupling cycle. It appears that the poor 
coupling efficiency results from steric problems encountered 
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by the enzyme in gaining access to the covalently bound 
primer chain. Unfortunately, it is not possible to combine 
these two modifications in an automated manner. The cur- 
rent phosphoramidite chemical method for oligonucleotide 
synthesis also utilizes a solid phase support to facilitate 5 
oligonucleotide purification after each coupling reaction. 

The present invention provides a method for enzymatic 
oligonucleotide synthesis which is preferably performed 
entirely in a single tube, requiring only temperature control 
and liquid additions, and not requiring intermediate purifi- lo 
cations or solid phase supports. This method is well suited 
for automation on a hquid handling robot apparatus, allow- 
ing the simultaneous preparation of a thousand oKgonucle- 
otides per day in microliter plates. This capability dwarfs the 
best commercially available instrument which can prepare 15 
only four oligonucleotides simultaneously with the phos- 
phoramidite method (Applied Biosystems, Inc.). 

SUMMARY OF THE INVENTION 20 

This invention provides a method for enzymatic synthesis 
of oligonucleotides of defined sequence. The method 
involves the steps of: 

(a) combining an oligonucleotide primer and a blocked ^ 
nucleotide, or a blocked nucleotide precursor that forms a 
blocked nucleotide in situ in a reaction mixture, in the 
presence of a chain extending enzyme effective to couple the 
blocked nucleotide to the 3 '-end of the oUgonucleotide 
primer such that a primer-blocked nucleotide product is 
formed, wherein the blocked nucleotide comprises a nucle- 
otide to be added to form part of the defined sequence and 
a 3'- blocking group attached to the nucleotide effective to 
prevent the addition of more than one blocked nucleotide to 
the primer; 

(b) removing the blocking group from the 3 '-end of the 
primer-blocked nucleotide product to form a primer-nucle- 
otide product; and 

(c) repeating at least one cycle of steps (a) and (b) using 
the primer-nucleotide product from step (b) as the oligo- 40 
nucleotide primer of step (a) of the next cycle, without prior 
separation of the primer-nucleotide product from the reac- 
tion mixture, using blocked nucleotides appropriate to the 
defined sequence of the oligonucleotide being synthesized. 

When the defined sequence calls for the same nucleotide 45 
to be incorporated more than once in succession, unreacted 
blocked nucleotide may be reused in the subsequent 
cycle(s). In this case, the blocking group is selectively 
removed from the primer-blocked nucleotide product sub- 
stantially without deblocking of the unreacted blocked 50 
nucleotide. Otherwise, the method includes the further step 
of converting any unreacted blocked nucleotide to an unre- 
active form which is substantially less active as a substrate 
for the chain extending enzyme than the blocked nucleotide. 
The method of the invention is preferably performed in a 55 
single reaction vessel, without intermediate purification of 
oligonucleotide product. 

In accordance with one embodiment of tiie invention, a 
single cycle comprises the steps in sequence: 

(a) incubation of an oligonucleotide primer with RNA ^ 
ligase and App(d)Np or App(d)Nip(d)N2P or precursors 
thereof, wherein App is an adenosine diphosphate moiety, 
and Np, Nj and N2 are a 3 '-phosphate-blocked nucleoside 
moiety, to form a primer-pNp product; 

(b) incubation with a Phosphatase; and 

(c) heat inactivation of the Phosphatase. 
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By careful selection of the conditions of the reaction with 
the Phosphatase, the selectivity of the enzymatic 
dephosphorylation reaction can be controlled, such that 
unreacted blocked nucleotide substrate is cither sub- 
stantially inactivated when it is not to be reused, and 
substantially left intact when reuse is desired. 

In accordance with a preferred embodiment, a single cycle 
of the method comprises the steps in sequence: 

(a) incubation of an oligonucleotide primer with RNA 
ligase and App(d)Np or App(d)Njp(d)N2P or precursors 
thereof; 

(b) incubation with an exonuclease and a nucleotide 
pyrophosphatase (e.g. snake venom phosphodiesterase I); 

(c) heat inactivation of the Exonuclease and Nucleotide 
Pyrophosphatase; 

(d) incubation with a Phosphatase; and 

(e) heat inactivation of the Phosphatase. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIGS. lA and B: The "Uncontrolled" and "Blocked" 
enzymatic methods previously used for the synthesis of 
ohgonucleotides. 

FIGS. 2A and B: The method of the invention for the 

synthesis of oligonucleotides. 

FIGS. 3A and B: The method of the invention for the 
synthesis of repeat regions of an oligonucleotide. 

FIG. 4: Synthesis of repeat and non-repeat regions using 
the method of the invention 

FIG. 5: Reactions catalyzed by RNA ligase. 

FIG. 6: An embodiment of the invention utilizing Transfer 
RNA ligase-as the chain extending enzyme. 

FIG. 7: An embodiment of the invention utilizing HeLa/ 
Eubacterial RNA Ligase as the chain extending enzyme. 

FIG. 8: The structure of AMP and the cleavage points of 
various en/ymes. 

FIG. 9: Apparatus for practicing the method of the inven- 
tion, suitable for synthesizmg many oligonucleotides simul- 
taneously. 

FIG. 10: Apparatus for practicing the method of the 
invention, suitable for the bulk synthesis of an oligonucle- 
otide. 

DETAILED DESCRIPTION OF THE 
INVENTION 

The present invention provides a method for synthesizing 
oligonucleotides enzymatically which can be performed in a 
single vessel without the need for any intermediate purifi- 
cation step. An embodiment of the method of the invention 
in Basic Mode is shown in RG. 2 A. In this embodiment, 
new nucleotide substrate is added for each new cycle. 

As shown in FIG. 2 A, a reaction mixture is formed 
containing an oligonucleotide primer, a blocked nucleotide 
substrate, and a chain extending enzyme such as RNA ligase 
and is incubated to couple the blocked nucleotide to the 
oligonucleotide primer. The RNA ligase may then be inac- 
tivated, for example by heating. The resulting reaction 
mixture contains the primer-blocked nucleotide product, 
unreacted primer, unreacted blocked nucleotide, and adenos- 
ine monophosphate (AMP). Alternatively, the RNA ligase 
may be left in active form and the substrate rendered inactive 
for further reaction with the primer. 

The next step as shown in FIG. 2A is incubation with an 
enzyme which removes the blocking group from the primer- 
blocked nucleotide product and umeacted blocked nucle- 
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otide. The resulting reaction mixture, containing unreacted 
primer, extended primer, and unblocked nucleotide substrate 
can then be recycled directly for use as the primer in the 
subsequent cycle without performing intermediate purifica- 
tion of extended primer. Such intermediate purification is 5 
taught by prior art as an essential step. 

FIG. 3A shows an alternative embodiment of the inven- 
tion, in which the unreacted blocked nucleotide is recycled 
to form a region of the oligonucleotide in which the same 
base is repeated. For example, the 8-mer oligonucleotide 1^ 
5'-AGUGGCCC-3' contains a consecutive repeat of G and 
two consecutive repeats of C. Synthesizing the repeat region 
of this oligonucleotide using the method shown in FIG. 2A 
results in a significant waste of materials. In this situation it 
may be preferable when synthesizing the oligonucleotide not 
to inactivate or deblock the unreacted nucleotide substrate 
during a cycle, so that the unreacted nucleotide can be 
reused in the ensuing cycle. This is accomplished by a 
modification of the method of FIG. 2A which is outlined in 
FIG. 3A. 20 

As shown in FIG. 3A, the first step is again the addition 
of a blocked nucleotide to the 3'-cnd of the primer. In this 
case, however, the blocking group is selectively removed 
from the primer-blocked nucleotide product without signifi- 
cantly deblocking, and thus inactivating, the unreacted ^ 
nucleotide in the reaction mixture using a 3'-phosphatase. 
The unblocked primer-nucleotide product is then used as the 
primer for the next cycle and unreacted blocked nucleotide 
is used as the blocked nucleotide of the next cycle. Similar 
to the method of FIG. 2 A, the modified method for synthesis 
of repeat regions may be performed without intermediate 
purification of the extended primer product. This method 
may be employed for as many cycles as necessary until the 
repeat region is synthesized. 

In the synthesis of an oligonucleotide with at least one 
repeat region and at least one non-repeat region, cycles of 
both methods shown in FIGS. 2 A and 3A may be employed 
to provide an overall synthetic strategy in which repeat 
regions are synthesized using either method, but preferably 
the method of FIG. 3A, and non-repeat regions are synthe- 
sized using the method of HG. 2A. A hypothetical synthesis 
is shown in FIG. 4. 

The method of the invention is surprisingly useful 
because problems identified in the prior art which suggest 45 
that the method would not work, have been found by the 
inventor not to limit the utility of the invention. Prior art 
(Hinton et al.) teaches that the extended primer product must 
be separated from the reaction mixture to remove App(d)N, 
which is able to couple to the primer in the next cycle. It is 50 
the discovery of the inventor that the unblocked nucleotide, 
App(d)N, is substantially less active as a substrate for RNA 
ligase (e.g. 50 to 100 times less active) than the blocked 
nucleotide, App(d)Np, obviating the need for separating the 
unblocked nucleotide from the extended primer product. 55 
Prior art (Middleton et al.) also teaches that the nucleoside 
and phosphate by-product of the phosphatase incubation 
substanti^ly inhibit RNA ligase, and must be separated from 
the extended primer product at the end of each cycle in order 
to work usefully. It is the discovery of the inventor that the go 
by-products of the enzymatic reactions do not significantiy 
inhibit the enzymes, especially RNA ligase. 

Experiments were performed by the inventor on each of 
the reaction by-products confirm the absence of significant 
inhibition of RNA ligase. The major by-products of the 65 
metiiod of the invention are nucleotides and PO4. No inhi- 
bition was detected in the presence of 10 mM PO4 and 10 
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mM Adenosine (a typical nucleoside). Extremely weak 
inhibition was observed in the presence of 100 mM PO4. In 
addition, other nucleotides were tested for inhibition: no 
inhibition was detected in the presence of 10 mM Adenine, 
1 mM AMP, 1 mM ATP, 2 mM AppAand 10 mM 3',5'-ADP; 
extremely weak inhibition was detected in tiie presence of 
10 mM Pyrophosphate; and strong inhibition was observed 
in the presence of 10 mM AMP and 10 mM ATP. Therefore, 
the only two products which are strong inhibitors, ATP and 
AMP, and one product which is an extremely weak inhibitor, 
pyrophosphate, will never accumulate to these high concen- 
trations since they are degraded by Alkaline Phosphatase. 

After the completion of the appropriate number of cycles, 
the synthesized oligonucleotide may be used in some appli- 
cations without purification. Alternatively, if purification is 
required, this can be accomplished using known methods: 
centrifugation, extraction with organic solvents such as 
phenol, chloroform and ethyl ether; precipitation, e.g. using 
ethanol or isopropanol in the presence of high salt concen- 
tration; size exclusion, anion exchange, reverse phase, or 
thin layer chromatography; ultrafiltration or dialysis; gel 
electrophoresis; hybridization to a complementary oligo- 
nucleotide; or by an affinity ligand interaction, such as 
biotin-avidin. The oligonucleotide may also be attached to a 
solid support throughout its synthesis, e.g., via the primer, in 
which case final purification may be performed by washing 
the support. 

The method of the invention may also be used in com- 
bination with other methods for synthesizing oligonucle- 
otides such that the method of die invention is used to make 
a portion of the final oligonucleotide product. Such other 
methods may include the blocked enzymatic method, the 
uncontrolled enzymatic method, the branched enzymatic 
method, chemical methods, transcription-based enzymatic 
methods, template-based enzymatic methods, and post-syn- 
thetic modification methods. 

The method of the invention offers numerous advantages 
by operating in a mild aqueous system. The specificity of the 
enzymatic reactions obviates the need for base protecting 
groups, highly reactive functional groups, and harsh solvent 
conditions. All nucleotide and enzyme reagents are non- 
hazardous and are stable at room temperature in aqueous 
solution. In contrast, the phosphoramidite chemical method 
is encumbered with hazardous solvents, unstable nucle- 
otides, harsh acids and bases, and solid phase supports. 

PRIMERS 

The primer used in the first cycle of the method of the 
invention, denoted as the "initial primer" herein, is an 
oligonucleotide of length sufficient to be extended by the 
chain extending enzyme. For example, if RNA ligase is the 
chain extending enzyme, the length is usually at least 
three-bases. Primers for use in the invention can be made 
using known chemical methods, including the phosphora- 
midite method. Other methods include DNase or RNase 
degradation of synthetic or naturally occurring DNA or 
RNA. Numerous primers suitable for use in the invention are 
commercially available from a variety of sources. The initial 
primer may be selected to provide the first three bases of the 
ultimate product, or it may be selected to provide facile 
cleavage of some or all of the initial primer to yield the 
desured ultimate product. 

In most applications, the presence in the oligonucleotide 
product of the 5'-extension corresponding to the initial 
primer is inconsequential. These applications may include 
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DNA sequencing, polymerase chain reaction, and hybrid- 
ization. However, some applications may necessitate the 
removal of all or part of this 5'-extension. Several proce- 
dures have been designed to achieve this result. These 
procedures are based on a structural or sequence difference 
between the initial primer and the synthesized oligonucle- 
otide, such that an enzyme can detect the difference and 
cleave the oligonucleolide into two fragments: an initial 
primer fragment and the desired synthesized oligonucleotide 
fragment. Such procedures preferably require only liquid 
addition to the oligonucleotide solution, and can be catego- 
rized by the type of synthesized oligonucleotide for which a 
procedure can be used: oligodeoxyribonucleotides, oligori- 
bonucleotidcs, or both types. 

OLIGODEOXYRIBONUCLEOTIDES: 

(1) Initial primers containing a 3' terminal ribose can be 
cleaved off with either RNase or alkaH. RNase, such as 
RNase A or RNase One (Promega), hydrolyzes only at the 
ribose bases of an oligonucleotide. 

(2) Initial primers containing a 3' terminal deoxyuridine 
base can be cleaved off by incubation with Uracil DNA 
Glycosylase, followed by base catalyzed beta elimination. 
Stuart et al, Nucleic Acids Res., 15(18): 7451-62 (1987). 

OLTGORIBONUCLEOTIDES : 

(1) Initial primers containing a 3' terminal deoxyribose 
base can be cleaved off witii DNase. Examples of RNase- 
free DNases include DNase I and DNase II. 

OLIGODEOXYRIBONUCLEOTIDES AND OLIGORI- 
BONUCLEOTIDES: 

(1) If the initial primer contains an appropriate recogni- 
tion sequence then the initial primer can be cleaved off by 
incubation with an appropriate ribozyme. Alternatively, the 
initial primer can itself be a ribozyme containing the 
ribozyme recognition sequence. Cleavage is performed by 
adjusting reaction conditions or adding a necessary cofactor 35 
to turn on the dormant ribozyme activity. 

(2) If the initial primer contains an appropriate recogni- 
tion sequence, then the initial primer can be cleaved off by 
incubation with an appropriate single- strand-recognizing 
restriction endonuclease. Examples of such endonuclease 40 
include Hha I, HinP I. Mnl I, Hae IH, BstN T, Dde I, Hga I, 
Hinf I, and Taq I (New England Biolabs catalog). 

(3) If tiie initial primer contains a 3 '-terminal 2'-0-methyl 
ribose base, then the initial primer can be cleaved off by 
incubation witii RNase alpha (J. Norton et al, J. Biol. Chem,, 
(1967), 242(9), 2029-34). RNase alpha cuts only at bases 
containing a 2'-0-methyl ribose sugar. 

(4) If the initial primer is composed of some ribose bases, 
an oligodeoxyribonucleotide specifically annealing to the 
initial primer and RNase H can be added to cleave off the 
initial primer. 

(5) If the initial primer is composed of some ribose bases, 
an oligoribonucleotide specifically annealing to the initial 
primer and a double strand specific BtNase such as RNase VI 
can be added to cleave the initial primer. If the initial primer 
is self-annealing, addition of an annealing oligoribonucle- 
otide would not be necessary. 

(6) An oligodeoxyribonucleotide may be added which 
anneals to the initial primer and forms a double stranded 
DNA region. The initial primer may then be cleaved by 
addition of an appropriate restriction enzyme. The initial 
primer can also be a self-annealing oUgodeoxyribonucle- 
otide, obviating the need to add an annealing oligodeoxyri- 
bonucleotide. 

(7) If the initial primer contains a unique ribose base 
absent from tiie synthesized oligonucleotide, then the initial 
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primer can be cleaved by incubation with an appropriate 
base-specific Ribonuclease. Examples include RNase CL3 
(cleaves after cytosine only), RNase Tj (cleaves after gua- 
nosine only), and RNase U2 (cleaves after adenosine only). 

(8) If the synthesized oligonucleotide contains at least one 
phosphorothioate intemucleotidic linkage, and the initial 
primer does not contain any phosphorothioate intemucleo- 
tidic linkages, then the initial primer can be cleaved off by 
incubation with an appropriate nuclease or 5'->3' exonu- 
clease, which is unable to hydrolyze phosphorothioate inter- 
nucleotidic linkages, or hydrolyzes them poorly. 

After cleaving off the initial primer from the synthesized 
oligonucleotide, the initial primer may be selectively 
degraded to nucleosides or nucleotides. This technique is 
based on the differential presence of a terminal phosphate 
monoester on the initial primer and on the synthesized 
oligonucleotide and the use of differential digestion with an 
appropriate exonuclease. Three techniques may be 
employed. 

If the cleavage results in a 5 -phosphate on the synthesized 
oligonucleotide fragment and a 5'-hydroxyl on the initial 
primer fragment, then subsequent incubation with spleen 
phosphodiesterase II (a 5' to 3' exonuclease) will selectively 
hydrolyze the initial primer fragment to nucleotides. The 
5 '-phosphate protects tiie synthesized oligonucleotide from 
hydrolysis. 

If the cleavage results in a 3'-hydroxyl group on the initial 
primer fragment and a 3'-phosphate on the synthesized 
oligonucleotide fragment, the initial primer fragment can be 
degraded using a 3' to 5' exonuclease. litis can be accom- 
plished by cleaving off the initial primer prior to the removal 
of the terminal 3 '-phosphate blocking group from the syn- 
thesized oligonucleotide. Suitable exonucleases include 
exonuclease I, phosphodiesterase T and polynucleotide phos- 
phorylase. 

If the cleavage results in a 5'-hydroxyl group on the 
synthesized oligonucleotide fragment and a 5'-phosphate on 
the initial primer, then the initial primer fragment can be 
degraded using a 5' to 3' exonuclease with a substantial 
preference for 5'-phosphate substrates such as lambda exo- 
nuclease. This can be accomplished by phosphorylating the 
oligonucleotide at the 5'-end prior to cleavage, e.g. using 
polynucleotide kinase. 

The cleavage of the oligonucleotide and digestion of the 
initial primer can be performed at any cycle of the synthesis. 
For bulk synthesis of a single oligonucleotide, it is prefer- 
ably performed at the end of the synthesis. For synthesis of 
multiple oligonucleotides simultaneously, it is preferably 
performed after synthesizing the first tiiree bases of the 
oligonucleotide. Further, it vdll be appreciated tiiat the 
cleavage does not necessarily need to occur at the junction 
of the initial primer region and the syntiiesized oligonucle- 
otide region. 

CHAIN EXTENDING ENZYME 

The chain extending enzyme used in tiie method of the 
invention is preferably RNA ligase. RNA ligase is commer- 
cially available from numerous suppliers and has been well 
characterized in the literature. The reactions catalyzed by 
RNA ligase relevant to the invention are shown in FIGS. 5 A 
and B. 

RNA ligase possesses a number of properties which make 
it particularly useful in the invention: 

(1) The coupling reaction catalyzed by RNA ligase is 
thennodynamically favorable. In the presence of an 
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AMP inactivating enzyme, the coupling reaction is 
irreversible. 

(2) RNA ligase couples numerous nucleotide analogs, 
allowing the synthesis of oligonucleotides containing 
these analogs using the method of the invention. Modi- 5 
fications include base analogs, sugar analogs, and inter- 
nucleotide linkage analogs. Uhlenbeck et al. The 
Enzymes, vol. xv, pp. 31-58, Academic Press (1982) 
and Bryant et al, Biochemistry, 21:5877-85 (1982). 

(3) RNA ligase couples both ribose and deoxyribose lo 
nucleotides, allowing the synthesis of oligodeoxyribo- 
nucleotides, oligoribonucleotides, and mixed ribose/ 
deoxyribose oligonucleotides using the method of the 
invention. 

(4) RNA ligase nucleotide substrate can be up to two 15 
bases in length in the method of the invention; i.e., 
App(d)Nip(d)N2P or p(d)Nip(d)N2p.. 

While RNA Ligase is the preferred chain extending 
enzyme for use in the present invention, other enzymes are 
within the scope of the invention. For example, because T4 20 
RNA Ligase requires replenishment after each cycle due to 
its thermal instability, further refinement of the method is 
anticipated by the use of a thermostable RNA Ligase. A 
thermostable RNA Ligase is workable since the presence of 
RNA Ligase in other steps of a cycle is not deleterious. A 25 
thermostable RNA Ligase could be added in the first cycle 
and would not need replenishment throughout the oligo- 
nucleotide synthesis, reducmg the expense of RNA Ligase 
per synthesis. Furthermore, a thermostable RNA Ligase with 
activity at elevated temperatures (65° to 95° C.) may provide 30 
the added benefit of reducing primer secondary structure 
interference with the coupling reaction. Another potential 
benefit of a thermostable enzyme is high activity at high 
ionic strength. One probable source of a thermostable RNA 
Ligase is thermophilic archeabacteria. 35 

Man-made genetic mutants of T4 RNA Ligase useful in 
the invention without modification include a mutant version 
with the improved ability to extend an oligodeoxyribonucle- 
otide primer, and a mutant version which is not inactivated 
at elevated temperatures. 40 

Several other enzymes are denoted in the literature as 
"RNA Ligases", i.e.. Transfer RNA Ligase and HeLa/Eu- 
bacterial RNA Ligase. These enzymes differ from T4 RNA 
Ligase in their substrate requirements in that they are 
reported in the literature as unable to extend a primer 45 
containing a 2'-hydroxyl, 3'-hydroxyl terminus. Conse- 
quently, they are not considered as RNA Ligase in this 
invention Nevertheless, these other enzymes do have the 
ability to act as chain extending enzymes within the scope of 
the present invention. 50 

Transfer RNA Ligase is reported in the scientific literature 
to catalyze a reaction similar to T4 RNA Ligase, but 
absolutely requiring a primer with a 2'-phosphate and 3 '-hy- 
droxy 1 terminus. Transfer RNA Ligase has been character- 
ized in several eukaryotes, including yeast (Apostol et al, J. 55 
Biol Chem., 266:7445-55 (1991)) and wheat germ 
(Schwartz et al, /. Biol Chem., 258: 8374-83 (1983)). Based 
on the fact that it is essential in transfer RNA processing. 
Transfer RNA Ligase should be ubiquitous in eukaryotes. 
Transfer RNA Ligase is a single polypeptide containing 60 
three distinct enzyme activities: ligase, cyclic phosphodi- 
esterase, and 5'-polynucleotide kinase. It is the Ligase activ- 
ity which catalyzes the ligation reaction described above for 
Transfer RNA Ligase. Since these separate activities have 
been mapped to separate locations on the polypeptide, it is 65 
conceivable that a mutant (e.g. a deletion mutant) can be 
constructed which contains only the ligase activity. 
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An embodiment of the method of the invention employing 
Transfer RNA Ligase or the mutant form as a chain extend- 
ing enzyme is shown in FIG. 6. Blocked nucleotide sub- 
strate, AppN-2',3'-cyclic phosphate, is coupled to a primer- 
2'-phosphate by the ligase. The second step is inactivation of 
unreacted blocked nucleotide substrate with Nucleotide 
Pyrophosphatase, e.g. snake venom phosphodiesterase I, 
and removal of the 2-phosphate widi a Phosphatase, e.g. 
Alkaline Phosphatase. (Phosphatase removal of 2'-phos- 
phate may be unnecessary). The Phosphodiesterase I also 
removes unextended primer chains. The third step is incu- 
bation with cyclic phosphodiesterase to remove the blocking 
group from the 3' end of the extended primer by converting 
the terminal 2',3'-cyclic phosphate to 2'-phosphate. Such a 
cyclic phosphodiesterase enzyme is one of the components 
of Transfer RNA Ligase, whose activity has been isolated by 
mutation. Apostol et al., J Biol Chem, 266:7445-7455 
(1991). The cycle is then repeated until the desired sequence 
is obtained. Conceivably, the nucleotide substrate reuse 
technique can also be implemented if Nucleotide Pyrophos- 
phatase is not added and the cyclic phosphodiesterase has 
the desired substrate selectivity. 

HeLa/Eubacterial RNA Ligase catalyzes the reaction: 
primcr-2',3'-cyclic phosphate-i-5'-hydroxyl-nucleotide sub- 
strate ->primer-nucleotide, by direct nucleophilic attack of 
the 5'-hydroxyl of the nucleotide substrate on the cyclic 
phosphate. The HeLa RNA Ligase forms a normal 3'-5'- 
phosphodiester linkage; the Eubacterial RNA Ligase forms 
an unusual 2'-5' phosphodiester linkage (Greer et al, Cell, 
vol 33, 899-906). An embodiment of the invention employ- 
ing HeLa or Eubacterial RNA Ligase as the chain extending 
enzyme is shown in FIG. 7. N-2'-phosphate, 3'-phospho-LG 
is used as the blocked nucleotide substrate, wherein LG is a 
good leaving group for nucleophilic displacement (such as 
dinitro-phenol or 5'-AMP) and the nucleoside N has a free 
5'-hydroxyl. The first step is HeLa or Eubacterial RNA 
Ligase incubation with a primer-2',3'-cyclic phosphate and 
blocked nucleotide substrate to form primer-blocked nucle- 
otide product. The second step is Phosphatase incubation to 
remove the 2 '-phosphate protecting group. Spontaneously or 
upon heating, the terminal 3'-phospho-LG will cyclize non- 
enzymatically to form 2',3'-cyclic phosphate. The cyclized 
unreacted nucleotide is probably a weaker or inactive sub- 
strate for the RNA Ligase in the next cycle. 

Terminal deoxynucleotidyl Transferase (TdT) is inca- 
pable of couphng its corresponding 3'-phosphate nucleotide 
substrate analog, dNTP-3'-phosphate. A suggestion has been 
made in the literature for producing a mutant form of TdT 
capable .of couphng dNTP-3 -phosphate. (Chang et al, CRC 
Critical Reviews in Biochemistry^ 21(1): 27-52). Such a 
mutant form would be a useful chain extending enzyme for 
the method of the invention. 

NUCLEOTIDE SUBSTRATES 

The blocked nucleotide substrate employed in the method 
of the invention is selected for compatibility with the chain 
extending enzyme, but generally comprises an activated 
nucleotide and a blocking group. The blocking group is 
bonded to the nucleotide so as to block reaction of the 
3'-hydroxyl group of the nucleotide. Such a nucleotide 
substrate is referred to generally herein as a "3'-blocked 
nucleotide." 

As used herein, the term "3 '-phosphate-blocked nucle- 
otide" refers to nucleotides in which the hydroxy! group at 
the 3 '-position is blocked by the presence of a phosphate 
containing moiety. Examples of 3 '-phosphate-blocked 
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nucleotides in accordance with the invention arc nuclcoti- 
dyl-3'-phosphate nionoester/nucleotidyl-2',3'-cyclic phos- 
phate, nucleotidyl-2'-phosphate monoester and nucleotidyl- 
2' or 3'-alkylphosphate diester, and nucleotidyl-2' or 
3 '-pyrophosphate. Thiophosphate or other analogs of such 5 
compounds can also be used, provided that the substitution 
does not prevent dephosphorylation by the phosphatase. 

When RNA ligase is employed as the chain extending 
enzyme, the choice of substrate influences the course of the 
reaction, as can be seen from a consideration of the follow- i^ 
ing reaction mechanism: 

(1) E+ATP < — > E-AMP+pyrophosphate 

(2) E-AMP-h3',5'-(d)NDP E[App(d)Np] 

(3) E[App(d)Np]-hprimer-3 -OH < — > (primer-p (d)N)-3'- 
phosphate+AMP-i-E wherein App is an adenosine diphos- 15 
phate moiety and Np is a 3 '-phosphate blocked nucleoside 
moiety, preferably a 3 '-phosphate monoester. The use of 
precursor nucleotides, ATP-i-3',5'-(d)NDP, results in a 
short lag period in the coupling reaction in which the 
concentration of App(d)Np must build up to sufficient 20 
levels in solution before step 3 can occur. The use of 
pre-activated nucleotide substrate, App(d)Np, avoids a lag 
period, allowing step 3 to occur instantly. Therefore, 
faster and more reliable RNA ligase coupling can be 
achieved using pre-activated nucleotide substrates. 25 
The scientific literature documents that the adenylylated 

enzyme is unable to catalyze step 3 of the reaction. The 
addition of a small amount of 3',5'-(d)NDP, when using 
pre-activated nucleotide substrate, App(d)Np, is believed by 
the inventor to prevent RNA ligase from being irreversibly 
inactivated by the reverse reaction of step 2. Consequently, 
it is believed that the coupling reaction proceeds widi greater 
efficiency. The addition of a small amount of pyrophosphate 
may perform the same function. 

Pre-activated blocked nucleotides for use as substrates in 35 
the method of the invention can be conveniently synthesized 
in accordance with Example L 

Other substrates which are coupled to the primer by the 
chain extending enzyme and which can be converted to an 
inert or slowly reacting product may also be employed. 40 

DEBLOCKDSTG ENZYMES 

When the 3 '-blocking group employed on the substrate is 
a phosphate group, the enzyme employed to remove the 45 
blocking group is a phosphatase. The principal function of 
the phosphatase is the irreversible removal of the 3 '-phos- 
phate blocking group from the extended primer (allowing 
subsequent RNA ligase coupling) and optionally, removal 
from the nucleotide substrate (preventing subsequent RNA gg 
ligase coupling). Careful selection of the phosphatase and 
the reaction conditions allows either: (1) dephosphorylation 
of both the extended primer and unreacted nucleotide sub- 
strate when substrate is not to be reused; or (2) dephospho- 
rylation of only the extended primer when substrate is to be 55 
reused in the next cycle. Non-specific phosphatases such as 
Alkaline Phosphatase and Acid Phosphatase are useful when 
substrate reuse is not desired, as depicted in FIG. 2A; 
specific 3'-Phosphatases such as T4 3'-Phosphatase and Rye 
Grass 3 '-Phosphatase are useful when substrate reuse is 50 
desired. 

Alkaline Phosphatase will hydrolyze any monoester phos- 
phate. Its high activity, especially at elevated temperatures, 
its substantial inability to degrade oligonucleotides, and its 
ability to be denatured irreversibly at 95° C. make it a useful 65 
deblocking enzyme in the invention. Alkaline phosphatase is 
readily available commercially from intestine and from 



bacteria. The inherent inorganic pyrophosphatase activity of 
alkaline phosphatase, not present in T4 3 '-phosphatase, 
prevents a pyrophosphate build-up which may inhibit RNA 
ligase. 

Acid Phosphatase has been isolated from wheat, potato, 
milk, prostate and semen, and catalyzes the same reactions 
as Alkaline Phosphatase. Acid Phosphatase can substitute 
for Alkaline Phosphatase if the pH of the reaction solution 
is acidic. Alkaline phosphatase is the preferred deblocking 
enzyme, however, when substrate is not to be reused in the 
next cycle. 

The 3 '-Phosphatases can be used either to dephosphoiy- 
late the primer selectively or to dephosphorylate both the 
primer and the nucleotide substrate depending on the reac- 
tion conditions selected. Low concentrations are used for 
selective dephosphorylation; high concentrations are used to 
dephosphorylate both. 

The technical challenge of selective dephosphorylation is 
that it entails removal of the blocking group from the 
primer-blocked nucleotide product without removal of the 
blocking group from the unreacted blocked nucleotide sub- 
strate. In the method of the invention using RNA Ligase as 
the chain extending enzyme and AppNp as nucleotide sub- 
strate, the technical difiSculty is selectively removing the 
3 '-phosphate blocking group of the extended primer, primer- 
pN-3 '-phosphate, without removing the 3 '-phosphate of the 
nucleotide substrate AppN-3 '-phosphate. This difSculty is 
exacerbated by the fact that primer-pN-3 -phosphate and 
AppN-3 '-phosphate are structurally identical with respect to 
the 3 '-phosphate group in that they both share the same 
pN-3'-phosphatc unit; the structural difference exists in a 
region distant from die 3'-phosphate, the component con- 
nected to the 5'-phosphatE. This high degree of structural 
similarity would seemingly make discriminating between 
the substrates unachievable. Furthermore, the degree of 
discrimination (selectivity) must be sufficiently high to make 
a nucleotide subsu-ate reuse technique useful. In the present 
invention, this challenge is solved as a result of the discov- 
ery that the enzyme 3 -Phosphatase is capable-of achieving 
the selective dephosphorylation and that it does so in a 
manner which makes the invention useful. 

3 '-Phosphatase dephosphorylates only 2'- or 3'-phosphate 
esters. Two 3'-Phosphatases are commercially available: 
bacteriophage T4 and rye grass; both are useful in tiie 
method of the invention. The T4 enzyme is a bifunctional 
enzyme containing Polynucleotide Kinase and 3 '-Phos- 
phatase activities, catalyzed from two independent active 
sites. The T4 enzyme is commonly sold as "Polynucleotide 
Kinase". Since it is the 3'-phosphatase activity which is of 
main relevance in this invention, this enzyme herein will be 
referred to as T4 3 '-Phosphatase. 3 '-Phosphatase derived 
from rye grass is sold commercially as "3 -Nucleotidase" 
(Sigma Chemical, E. C. 3.1.3.6). This enzyme will also 
herein be referred to in this specification as 3'-Phosphatase. 
The method of the invention embodies any 3 '-Phosphatase 
with the aforementioned substrate selectivity. 

Genetic mutants of T4 3'-Phosphatase which lack asso- 
ciated kinase activity would also be useful in the invention. 
This task has already been described in the literature. A 
genetic mutant called pseT47 and a proteolytic fragment of 
the enzyme have the 3 '-Phosphatase activity, but no kinase 
activity. Soltis et al., 7. Biol C/iem. 257:11340-11345 
(1982). Removal of the associated kinase activity may be 
desirable in preventing oligonucleotide circularization or 
polymerization. Other useful 3 '-Phosphatases may be con- 
structed by making genetic mutations which remove unde- 
sirable associated enzyme activities. 



5,602,000 



13 



14 



Given that 3 '-Phosphatase is probably widespread in 
nature, it is anticipated that other 3'-Phosphatases derived 
from other sources will display similar or perhaps superior 
selective dephosphorylation and will also be useful in the 
invention. Thus far, experiments performed by the inventor 
have been unable to demonstrate that reuse of substrates can 
be applied to deoxyribose substrates AppdNp, since it 
appears that 3'-Phosphatase lacks the ability to selectively 
dephosphorylate primer-pdNp without substantially dephos- 
phorylating AppdNp. A corresponding ■ 2'-deoxy-3'-phos- 
phatase with the aforementioned selectivity would be useful 
for AppdNp substrate reuse. 

Special consideration is necessary for the method of 
FIGS. 3 A and B to avoid significant co-incubation of 
3 '-phosphatase activity and RNA ligase activity in the pres- 
ence of primer+AppNp, which may result in uncontrolled 
substrate addition. For example, RNA ligase may be heat 
inactivated after use, or using a thermostable enzyme, the 
RNA ligase activity can be temporarily turned off by low- 
ering the temperature during the 3 -phosphatase incubation. 

T4 3 '-phosphatase has potential disadvantages with 
respect to its use in the synthesis of non-repeat regions of an 
oligonucleotide, as follows: (1) The 3 '-phosphatase activity 
on unreacted nucleotide substrate is substantially slower 
than Alkaline Phosphatase; (2) AMP which is generated by 
the RNA ligase coupling reaction is not hydrolyzed by 
3 -phosphatase and its accumulation after many coupling 
cycles may inhibit RNA ligase; and (3) associated kinase 
activity may result in cyclization or polymerization of the 
oligonucleotide if ATP is employed in the RNA ligase 
coupling reaction. Thus, while T4 3 '-phosphatase is useful 
for all aspects of tlie method of the invention, the preferred 
Phosphatase for synthesis of non-repeat regions is Alkaline 
Phosphatase. 

Other blocking groups which might be used in the method 
of the invention include blocking groups which are removed 
by light, in which case the addition of an enzyme te 
accomplish the imblocking would be unnecessary. See Oht- 
suka et al, Nucleic Acids Res, 6(2):443-54 (1979). Other 
blocking groups include any chemical group covalently 
attached to the 2'- or 3'-hydroxyl of App(d)N-3'-0H, which 
can be removed without disrupting the remainder of the 
oligonucleotide. This may include esters, sulfate esters, 
glucose acetals, a heat labile group, or an acid or base labile 
group, which can be removed by incubation with esterases 
or proteases, sulfatases, glucosidases, heat, or acid or base, 
respectively. 

ADDITIONAL METHOD STEPS 

To synthesize long oligonucleotides, it is desirable to 
overcome two potential problems: the extension of the chain 
with unreacted nucleotide of the wrong type, and the sub- 
sequent extension of failed reaction products (unextended 
primer) from a previous cycle. These problems can be 
overcome by the addition of one or more additional enzymes 
to the basic scheme shown in FIG. 2A or 3A. 

When synthesizing long oligonucleotides, such as about 
25 bases or more, the unblocked nucleotide App(d)N con- 
centration may build up to an extent that it couples to the 
primer at an unacceptable level, despite the fact that it is far 
less reactive than App(d)Np substrate. To minimize the 
incorporation of such residual nucleotides from previous 
reaction cycles, an additional enzyme can be added during, 
after or prior to the unblocking step which is effective to 
further degrade unreacted nucleotide substrate or nucleotide 
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fragments into products that are no longer suitable substrates 

for RNA ligase. 

A suitable enzyme for this purpose is a Dinucleotide 
Pyrophosphate Degrading Enzyme. Five distinct enzymes 
are capable of degrading App(d)N or App(d)Np, as 
described in the scientific literature: 

(1) Nucleotide Pyrophosphatase (E. C. 3.6.1.9) 

(2) Acid Pyrophosphatase (Tobacco, Sigma Chemical Co.) 

(3) Diphosphopyridine Nucleosidase (E. C. 3.2.2,5)+ADP- 
Ribose Pyrophosphatase (E. C. 3.6.1.13) 

(4) Dinucleotide Pyrophosphate Deaminase (Kaplan et al, J. 
Biol Ckem., 194: 579-91 (1952)) 

(5) Dinucleotide Pyrophosphate Pyrophosphorylase (A. 
Romberg, J. Biol. Chem., 182: 779-93 (1950)) 

These enzymes are suitable for this invention because the 
degradation products are not substrates for RNA ligase. 
Among the Dinucleotide Pyrophosphate Degrading 
Enzymes, the preferred enzyme is Nucleotide Pyrophos- 
phatase. This enzyme offers the following advantages: the 
reaction is irreversible, the enzyme degrades both App(d)N 
and App(d)Np; and nucleotide substrate is hydrolyzed to 
nucleosides+P04 when used with Alkaline Phosphatase. 
This is advantageous since nucleosides and phosphate are 
substantially non-inhibitory to all the enzymatic reactions of 
the method. Precipitation of nucleosides as a result of 
accumulation and poor solubility is probably beneficial by 
making the nucleosides inert to all reactions of the oligo- 
nucleotide synthesis, and by facilitating separation of the 
nucleosides from the final oligonucleotide product by cen- 
trifiigation. The use of App(d)Nip(d)N2P as a nucleotide 
substrate for RNA ligase requires tihe use of a Dinucleotide 
Pyrophosphate Degrading enzyme and Alkaline Phos- 
phatase to achieve inactivation for use in the method. 

Nucleotide Pyrophosphatase has been isolated from a 
great number of sources: human fibroblasts, plasmacytomas, 
human placenta, seminal fluid, Haemophilus influenzae, 
yeast, mung bean, rat liver, and potato tubers. The source 
with the best characterized enzymatic properties is potato 
tubers Bartkiewicz et al, Eur. J. Blochem., 143:419-26 
(1984). Bartkiewicz et al have shown that purified enzyme 
is capable of hydrolyzing dinucleotide pyrophosphates spe- 
cifically, without hydrolyzing DNA or RNA. Nucleotide 
Pyrophosphatase isolated from snake venom is commer- 
cially available (Sigma Chemical Co.) and is the same 
enzyme as Phosphodiesterase 1. (PDE-I) Accordingly, 
PDE-I can also be used to convert unreacted nucleotides into 
a form which does not serve as a substrate for RNA ligase. 
However, the exonuclease activity of the PDE-I warrants 
careful consideration since this activity may destroy the 
oligonucleotide product and prior art does not teach the use 
of exonucleases in a synthetic method. 

The second potential difficulty with the method of the 
invention arises from a build up of failure sequences due to 
incomplete RNA ligase coupling. The RNA ligase coupling 
reaction can be substantially optimized kinetically in accor- 
dance with the invention. Dithiothreitol and TRITON X-100 
(octylphenoxy polyethoxy ethanol) greatly stimulate RNA 
ligase activity. Nevertheless, even under optimized condi- 
tions, the coupling reaction is not 100% efficient, resulting 
in primer chains which have not been coupled to the blocked 
nucleotide. If not removed, these unreacted primer chains 
will still be able to couple with nucleotide in the next 
coupling cycle. This will result in the accumulation of (n— 1) 
failure sequences in the final product mix. Two independent 
solutions have been devised-by the inventor to solve this 
problem: Exonuclease treatment and Enzymatic Capping. 
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An exonuclease can be added after RNA ligase coupling 
to hydrolyze uncoupled primer chains to (d)NMP's. The 
Exonuclease can be utilized before, after, or concurrently 
with the dinucleotide pyrophosphate degrading enzyme. The 
Exonuclease used for this purpose should have the following 5 
properties: 

(1) hydrolyzes oligonucleotides in the 3' to 5' direction; 
and 

(2) hydrolyzes specifically oligonucleotides with a free 
terminal 3'-hydroxyI group and is substantially unable 10 
to hydrolyze oligonucleotides which are blocked at the 

3 '-end. 

Primer chains which fail to couple during incubation with 
RNA ligase diflfer from primer chains which do couple. 
Uncoupled primers have a 3'-hydroxyl terminus; coupled 15 
primers have a blocked 3'-phosphate. Therefore, as a result 
of the selectivity of the Exonuclease, only uncoupled primer 
chains are degraded to (d)NMP's. Exonuclease incubation 
should be performed prior to incubation with Phosphatase, 
and exonuclease activity should not be present during phos- 20 
phatase incubation. Otherwise, oligonucleotide product will 
be hydrolyzed. 

Three enzymes satisfy these criteria and are suitable as 
Exonuclease in tliis invention: Exonuclease I {E. coli\ 
Phosphodiesterase 1 (snake venom), and Polynucleotide 25 
Phosphorylase. Phosphodiesterase 1 hydrolyzes both oligori- 
bonucleotides and oligodeoxy ribonucleotides; Exonuclease 
I is substantially specific for oligodeoxyribonucleotides 
(although it has been used successfully on mixed deoxyri- 
bose/ribose oligonucleotides); Polynucleotide Phosphory- 30 
lase is substantially specific for oligoribonucleotides. TRI- 
TON X-100 and dithiothreitol have been observed 
experimentally by the inventor to stimulate the activity of 
Exonuclease I and PDE-I. 

PDE-I offers two advantages: (1) PDE-1 hydrolyzes both 35 
oligoribonucleotides and oligodeoxyribonucleotides, mak- 
ing it useful for the synthesis of both, and (2) PDE-I has 
nucleotide pyrophosphatase activity. Although PDE-1 
requires careful control of enzymatic reaction conditions to 
avoid degrading primer chains blocked by a 3 '-phosphate, 40 
conditions can be achieved to hydrolyze all 3'-hydroxyl 
primer chains and all unreacted blocked nucleotide substan- 
tially without hydrolyzing 3 '-phosphate primer chains. 
Given that it is advantageous to use a Dinucleotide Pyro- 
phosphate Degrading activity and an Exonuclease activity 45 
simultaneously, snake venom PDE-I provides two functions 
for the price of one enzyme. 

The combination of these two modifications of the Basic 
method results in the Preferred method for the synthesis of 
oligonucleotides, outhned in FIGS. 2B and 3B. The power 50 
of this method is exemplified in Example 5. ApApCpdApdA 
is synthesized by two coupling cycles with the activated 
nucleotide AppdAp and ApApCp initial primer. Thin layer 
chromatography demonstrated that the reaction mixture at 
the end of the synthesis contained only the oligonucleotide 55 
ApApCpdApdA and the nucleosides adenosine and deoxy- 
adenosine. The mixture was devoid of traces of n-1 and n-2 
failure sequences. Due to the enormous size difference 
between the n-mer oligonucleotide product and the nucleo- 
sides, the oligonucleotide product can be easily purified. 60 
Furthermore, an application may not require removal of the 
nucleosides. 

As mentioned earlier, another technique can be used to 
remove uncoupled primer chains, denoted herein as "Enzy- 
matic Capping." After the RNA hgase coupling reaction, 65 
.unreacted primer chains can be capped with a chain temu- 
nating nucleotide catalyzed by a transferase enzyme. The 



capped chains are no longer substrates for coupling with 
RNfA ligase in subsequent coupling cycles. Primer chain 
termination can be achieved with Terminal deoxynucleotidyl 
Transferase+dideoxynucleoside triphosphate or with RNA 
ligase+AppddN (the dideoxy analog of AppdN). Chain 
terminated failure sequences can be subsequently hydro- 
lyzed to nucleotides using an exonuclease as described 
above. One potential disadvantage of the enzymatic capping 
technique is the coupling efiiciency of the chain terminating 
step. If the coupling efBciency is low, then (n-1) failure 
sequences will be present in the final solution mixture. Thus, 
the favored method for removing uncoupled primer chains is 
the Exonuclease method discussed earlier. 

REMOVAL OF AMP 

AMP generated during the coupling reaction may inhibit 
the forward coupling reaction or participate in the reverse 
coupling reaction. In accordance with the invention, this can 
be avoided by the addition of an enzyme or enzyme com- 
bination which degrades AMP to a less inhibitory form. For 
the purpose of this invention, an AMP Inactivating Enzyme 
or Enzyme Combination, is defined as an enzyme or enzyme 
combination which converts Adenosine 5'-Monophosphate 
(AMP) to a less reactive form, i.e., to a form which is less 
inhibitory to the forward coupling reaction catalyzed by 
RNA Ligase, or which is less able to participate in the 
reverse coupling reaction catalyzed by RNA Ligase, or 
which assists in driving (thermodynamically or kinetically) 
the forward coupling reaction catalyzed by RNA Ligase. An 
AMP Inactivatmg Enzyme or Enzyme Combination is use- 
ful in making the RNA Ligase coupling reaction faster, more 
efficient, or more reliable, by converting AMP, generated by 
the forward coupling reaction, to a form with diminished 
undesirable properties. 

Several AMP Inactivating Enzymes have been devised by 
the inventor. These enzymes are preferably used concur- 
rentiy with RNA Ligase incubation since they do not sub- 
stantially degrade primer, extended primer product, or App- 
(d)Np substrate. These enzymes can be present or can be 
used at any or all steps of a cycle since their activity is not 
deleterious to the One Pot method. Such enzymes include: 

(1) 5-Nucleotidase (E. C. 3.1.3.5): 
AMP-l-H20->Adenosine-i-phosphate 

(2) AMP Nucleosidase (E. C. 3.2.2.4): 
AMP-HH20->Adenine-!-ribose-5-phosphate 

(3) AMP Deaminase (E. C. 3.5.4.6): 
AMP-l-H20-^Inosine-5'-phosphate-i-NH3 

For clarity, FIG. 8 shows the structure of AMP and the 
location of the covalent bond broken by the hydrolytic 
activity of each enzyme. Experiments by the inventor 
strongly suggest that the hydrolytic products of these 
enzymes are less inhibitory to RNA Ligase than AMR 
Furthermore, it is strongly suspected that these hydrolytic 
products are unable to participate in tiie reverse RNA Ligase 
coupling reaction. Example 19 demonstrates the use of these 
enzymes. 

These three enzymes using AMP substrate may be com- 
bined in a rational manner with other enzymes, which 
further convert their products to even less reactive products, 
to create an AMP Inactivating Enzyme Combination. Such 
enzymes include: 

(1) Adenosine Nucleosidase (E. C. 3.2.2.7): 
Adenosinc+H20 ^Adenine+ribose . 

(2) Adenosine Deaminase (E. C. 3.5.4.4): 
Adenosine+HzO-^Inosine+NHg 

(3) Nucleoside Phosphorylase (E. C. 2.4.2.1): 
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Adenosine+P04->ribose- 1 -phosphate+Adenine 
Example 19 demonstrates the enzyme combination 
5'-Nucleotidase+Adenosine Deaminase. Other potentially 
useful combinations, such as 5'-Nucleotidase+Adenosine 
Nucleosidase, can be constructed by identifying the side 5 
product which one wishes to convert to a less reactive form 
and consulting Enzyme Nomenclature (Academic Press, 
1992) or the scientific literature to locate an enzyme which 
effects the conversion. For example, to remove adenine., 
consultation with Enzyme Nomenclature discloses the 10 
enzyme Adenine Deaminase (E. C. 3.5.4.2) which converts 
adenine to hypoxanthine, which may be suitable for inclu- 
sion in an enzyme combination. Similarly, uridine can be 
converted to uracil by adding Uridine Nucleosidase (E. C. 
3.2.2.3). 15 

AMP Nucleosidase and AMP Deaminase are reported in 
the literature as allosterically activated by ATP and allos- 
terically inactivated by phosphate. Experiments indicate that 
these enzymes have adequate activity in the absence of ATP 
and under the conditions employed for oligonucleotide 20 
synthesis demonstrated in Example 19. A thermostable 
version is probably obtainable from a thermophilic organ- 
ism, e.g. Thermus aquaticus, Pyrococcus, etc and would be 
useful in the method since replenishment would be unnec- 
essary. 25 

The concept of an AMP Inactivating Enzyme or Enzyme 
Combination as a useful teclinique in the method of the 
invention is not limited to the enzymes disclosed in this 
specification, but shall include any enzyme which can be 
implemented for the previously stated purpose. Such 30 
enzymes may already be described in the literature, may be 
discovered in the future, or may be a man-made genetic 
modification of a known AMP Inactivating Enzyme. For 
example, a mutant of either AMP Nucleosidase or AMP 
Deaminase with constitutively high activity would be usefiil. 35 
Many examples exist in the literature in which mutations 
affect allosteric enzyme properties. 
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While the foregoing describes the basic aspects of Ihe 
claimed invention, it will be appreciated that numerous 
modifications are possible without departing firom the basic 
invention. 

In practicing the method of the invention, enzyme inac- 
tivation where needed can be readily accomplished using 
heat or by proteolysis with a protease, e.g., proteinase K. 
Protease can be subsequently inactivated by heat or by 
chemical inhibitor such as phenylmethylsulfonyl chloride. 

Proteolysis with proteinase K can also be used to hydro- 
lyze the denatured protein debris, which accumulates as a 
result of heat inactivation of enzymes, to small soluble 
peptides. Although the debris is inert, its accumulation after 
many cycles may pose a viscosity problem for mixing or 55 
pipetting operations. The proteolytic digestion may be 
enhanced by the addition of TRITON X-100. Physical 
methods for removing the debris such as filtration, ultrafil- 
tration, centrifugation, and extraction with organic solvents 
such as phenol and chlorofonn can also be utilized, but are 50 
not readily automated and are more appropriate as an option 
at the end of the synthesis. 

The method of the invention is particularly well adapted 
to the synthesis of oligoribonucleotides. It can also be used 
to synthesize oligodeoxyribonucleotides, although coupling 65 
times will be longer and coupling efficiencies will be lower. 
For most applications an oligoribonucleotide can substitute 



for an oligodeoxyribonucleotide with equal effectiveness. 
Oligoribonucleotides can be used as hybridization probes, as 
primers for dideoxy DNA sequencing (RNase can remove 
the primer prior to electrophoresis); as primers for the 
polymerase chain reaction using a thermostable reverse 
transcriptase; and as probes for the ligase chain reaction. 

For apphcations which have an absolute requirement for 
oligodeoxyribonucleotides, an oligoribonucleotide may be 
converted to its complementary oligodeoxyribonucleotide. 
The oligoribonucleotide can be synthesized with a hairpin at 
the 3'-end, allowing priming for reverse transcriptase, and 
subsequent RNase H digestion. 

Large scale manufacture of enzymes employed in the 
present invention having suitable purity may be accom- 
plished by established methods for expression of recombi- 
nant protein in an overproducing organism. One such tech- 
nique is to manufacture the enzymes as fusion proteins with 
an afl&nity protein, allowing purification in one step by 
affinity chromatography. As the activity of many enzymes is 
not affected by the presence of the affinity protein, pro- 
teolytic removal of die affinity protein is probably not 
necessary. 

Alternative embodiments of the present invention may be 
implemented to reduce the cost of enzymes. For example, 
instead of inactivating flie enzymes by heat or proteolysis, 
enzymes may be recovered firom the oligonucleotide solu- 
tion by passing the solution through an enzyme-binding 
solid support, such as an affinity chromatography column, 
and tiien optionally reused in later cycles of the invention. 
Alternatively, enzymes may be covalently attached to a solid 
support matrix and placed in columns. The method of the 
invention is then performed by pumping solutions through 
the appropriate columns. 

The hydrolysis of phosphate anhydrides by Alkaline 
Phosphatase and Nucleotide Pyrophosphatase, and the 
hydrolysis of phosphodiesters by Exonuclease releases an 
equivalent of acid. Preventing an unacceptable drop in pH, 
especially for long oligonucleotides, may entail the occa- 
sional addition of base or the use of a higher buffer con- 
centration. 

Phosphate concentrations exceeding about 20 mM at pH 
8.0 and 10 mM MgClj may eventually precipitate the 
magnesium. This is deleterious since magnesium is a 
required cofactor for many of the enzymes in the One Pot 
method. This problem can be solved by conducting the 
synthesis at pH 7.0. Experiments confirm that no precipita- 
tion of MgP04 is observed in a solution of 10 mM MgCl2 
and 250 mM PO4 at pH 7.0. Alternatively, phosphate can be 
removed by precipitation out of solution by adding an excess 
of Mg"^, Ca"^, Al"^ or other cationic species which forms 
an insoluble phosphate salt. Hydrolysis of pyrophosphate by 
Inorganic Pyrophosphatase prevents precipitation of mag- 
nesium pyrophosphate, which is highly insoluble in aqueous 
solutions. 

Growth in the reaction mixture of microorganisms may 
result in the secretion of nucleases which could degrade tiie 
nucleotides and oligonucleotides. This problem is mini- 
mized by the frequent heat inactivation steps which sterilize 
the reaction solution and the use of the detergent TRITON 
X-100 which may hinder most microbes. Alternatively, 
microbial growth inhibitors, such as glycerol, EDTA, 
sodium azide, merthiolate, or antibiotics may be added to the 
reaction solution. A useful growth inhibitor for the metiiod 
of the invention should not significantiy inhibit the enzy- 
matic reactions in the synthesis of the oligonucleotide. No 
significant inhibition of RNA ligase was observed by the 
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inventor in the presence of 0.1% sodium azide and 0.1% 
merthiolatc. 

Inadvertent nuclease contamination of the synthesis reac- 
tion can be countered by adding a nuclease inhibitor or 
adding protease intermittently. Numerous RNase inhibitors 
are described in the literature,, including RNase Inhibitor 
Protein (human placenta) and Vanadyl Ribonucleoside 
Complexes (Sigma Chemical Co). No significant inhibition 
of RNA ligase was observed by the inventor in the presence 
of 0.1 mM vanadyl ribonucleoside complexes. 

Evaporative loss can be minimized by reducing the tem- 
perature or duration of the heat inactivation steps or by 
overlaying the aqueous phase with light mineral oil. For 
example, snake venom PDE-I can be inactivated by heating 
at 50° C. for 5 minutes; commercially available heat labile 
alkaline phosphatase from Arctic fish can be inactivated at 
65° C. 

Consumption of dithiothreitol, or other reducing agents, 
that stimulate the activity of the enzymes used in the One Pot 
method by oxidation may be solved by either intermittent 
replenishment or by conducting the synthesis in an oxygen- 
free environment. 

The formation of secondary structure in an oligonucle- 
otide may block enzymatic access to the 3'-end of the 
oligonucleotide. Several measures may be taken. The oli- 
gonucleotide can be synthesized as several smaller pieces 
which do not self anneal and then ligated together with RNA 
ligase. Alternatively, the base portion of a nucleotide can be 
modified with protecting groups such as acetyl groups which 
prevents base pairing. The protecting groups are removed at 
the end of the synthesis. A third alternative is the addition of 
denaturants to the reaction mixture which disrupt oligo- 
nucleotide base pairing without substantially inhibiting the 
enzymatic reactions. Suitable denaturants include dimethyl 
sulfoxide, formamide, methylmercuric hydroxide and gly- 
oxal. No significant inhibition of RNA ligase was observed 
by the inventor in the presence of 20% dimethyl sulfoxide. 

APPARATUS 

The minimal configuration for an apparatus which is 
useful for synthesizing oligonucleotides by the method of 
the invention is: (1) at least one vessel containing reaction 
solution for performing the synthesis of an oligonucleotide, 
(2) means for controlling the temperature of the reaction 
solution(s), (3) means for separately supplying at least four 
different blocked nucleotide feed stocks to the solution(s), 
(4) means for supplying at least one enzyme feed stock to the 
solution(s), and (5) means for controlling the sequential 
addition of blocked nucleotide feed stocks and enzyme feed 
stock(s) to the solution(s). Two separate embodiments of the 
minimal configuration are described. 

FIG. 9 shows an apparatus which can be used in the 
practice of the invention for synthesizing many oligonucle- 
otides simultaneously. The apparatus has a plurality of 
reaction vessels in the form of wells 2 drilled in a metal 
block 1. At least four different blocked nucleotide feed 
stocks and at least one enzyme feed stock are provided from 
reagent bottles 4 using one or several liquid handling robots 
3. The temperature of the block can be increased by turning 
on a heating element (not shown) beneath the block and can 
be lowered by opening a valve 6 which allows water 5 to 
flow through a cavity (not shown) underneath the block and 
then exit 7. A computer (not shown) controls the sequential 
addition of blocked nucleotides and enzyme(s) to the vessels 
and controls the temperature of the block. 
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This apparatus can be further improved by providing a 
separate means for mixing the synthesis reaction solutions 
witiiout the need for the robotic liquid dispensing system to 
mix reaction solutions. This can be accomplished by placing 
a magnetic stir bar or many small magnetic or paramagnetic 
particles in each of the wells in active use, and agitating the 
stir bars with a moving magnetic field. Wells may be coated 
with an inert material to avoid heavy metal contamination. 

FIG. 10 shows an apparatus which can be used in the 
practice of tiie invention for synthesizing a single oligo- 
nucleotide in bulk quantity. It consists of a single large 
vessel 53 for the synthesis reaction which is mixed by a 
stirring device. The stirring device may be a motor 51 
connected to a rotating impeller 52, or alternatively a large 
stir bar (not shown) rotated by a magnetic stirrer (not 
shown). The temperature of the reaction solution is 
increased with a heating device 54 or a heating element (not 
shown) located inside cavity 60, and lowered by opening a 
valve 59 which allows cool water 58 to flow into a cavity 60 
beneath the vessel and then exit 61 the cavity. The four 
blocked nucleotide feed stocks 63 are added to the vessel 
either by four separate pumps (not shown) or by a single 
pump with a valve controlling connection of the feed stocks 
to the pump (not shown). At least one enzyme feed stock 64 
can be added in the same manner. A computer (not shown) 
controls the sequential addition of blocked nucleotides and 
enzyme(s) to the vessel and controls the temperature of the 
solution. 

Additional components could enhance the performance of 
the bulk scale synthesizer. AnciUary feed stocks 65 for 
additional blocked nucleotides, enzymes, or other reagents 
can be added. The temperature of die reaction solution is 
monitored by a temperature probe 55. A pH probe 56 
monitors the reaction solution pH and acid or base feed 
stocks 62 can be added as necessary to maintain pH as 
desired. An inert gas such as nitrogen is slowly added via 
tube 57 to the reaction solution to remove oxygen (which 
can be monitored by an oxygen electrode). A computer (not 
shown) can control the apparatus, receiving inputs of solu- 
tion temperature, pH, and sending outputs to control the 
addition of feed stocks (blocked nucleotide feed stocks, 
enzyme feed stock(s), acid, base, and ancillary reagents), 
heating device, cooling valve 59, nitrogen purge rate, and 
motor rotation speed. Nucleoside and phosphate by-products 
may be reduced by adding a dialysis or ultrafiltration system 
(not shown). 

Reagents 

Several reagents useful in the practice of the invention 
have not been previously described, and these reagents are 
an aspect of the present invention. In particular, the activated 
deoxyribonucleotides AppdAp, AppdGp and AppdCp; and 
dinucleotides of the general formula App(d)Nip(d)N2P, 
wherein Nj and arc any nucleosides. 

The activated deoxyribonucleotides can be synthesized by 
phosphorylation of the 5' hydroxyl of the corresponding 
3'-dNMP using phosphatase free polynucleotide kinase and 
ATP, to yield 3',5'-dNDR This is then activated in accor- 
dance with Example 1. 

The dinucleotides can be synthesized in several steps. 
First, (d)Nip(d)N2p(d)N3 is synthesized chemically, for 
example using the phosphoraraidite method. This product is 
then phosphorylated using ATP and Polynucleotide Kinase 
to yield p(d)Nip(d)N2p(d)N3. The enzyme is tiien inacti- 
vated. The phosphorylated material is tiien partially 
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digested, e.g. using RNase, DNase or a nuclease to yield 
p(d)NiP(d)N2p. The enzyme activity is then removed using 
protease followed by heat, after which the material is 
activated as in Example 1. Activation of such dinucleotides 
substrates is greatly accelerated by the presence of a primer. 5 

While the method of invention can be described in terms 
of a cycle of steps which result in synthesis of oligonucle- 
otides, certain aspects of the invention are independently 
viewed as part of applicant's inventive concept. For 
example, the application of an exonuclease to degrade any 
oligonucleotide primer which was not extended is a useful 
improvement in the context of any method for synthesizing 
an oligonucleotide, wherein an oligonucleotide primer is 
extended coupling a blocked nucleotide to the 3 '-end of the 
primer, wherein primer-blocked nucleotide product is resis- 15 
tant to exonuclease attack. Similarly, the application of a 
transferase enzyme and a chain terminating nucleotide, 
whereby any oligonucleotide primer which was not 
extended is end-capped to render it unreactive to further 
extension in any method for synthesizing an oligonucleotide 20 
is a useful improvement in the context of any method for 
synthesizing an oligonucleotide wherein an oligonucleotide 
primer in a reaction mixture is extended by coupling a 
blocked nucleotide to the 3 '-end of the primer, such that 
primer-blocked nucleotide product is formed that is unreac- ^5 
tive with the transferase enzyme. 

The method will now be further described by way of the 
following, non-limiting examples. 

EXAMPLE 1 

Enzymatic Synthesis of AppAp and AppdAp 

The synthesis of activated nucleotides, App(d)Np and 
App(d)Np(d)Np can be performed enzymatically using 35 
RNA ligase Inorganic Pyrophosphatase. This example dem- 
onstrates the synthesis of AppAp and AppdAp; other acti- 
vated nucleotides can be synthesized in the same manner. 

The following solution in a total volume of 300 ul was 
placed in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM 
MgCla, 10 mM Dithiothreitol (DTT), 0.1% TRITON X-100, 
11 mM 3',5-ADP, 10 mM ATP, 0.1 units Inorganic Pyro- 
phosphatase (yeast, Sigma Chemical Co.), 80 units RNA 
ligase (phage T4, New England Biolabs). For the synthesis 
of AppdAp, 3',5'-dADP was used in place of 3',5'-ADP. This 
solution was incubated at 37° C. for 40 hours. RNA ligase 
was heat inactivated at 95° C. for 5 minutes. Residual ATP 
was removed by adding 2 units Hexokinase (yeast, Sigma 
Chemical Co.)+15 ul 200 mM glucose and incubating at 37° 
C. for 1 hour. Hexokinase was heat inactivated at 95° C. for 
5 minutes. The solution was cooled to room temperature and 
pelleted at 12,000 g for 1 minute to remove the insoluble 
protein debris. This final product was analyzed by thin layer 
chromatography on silica using isobutyric acidiconcentrated 
ammonium hydroxide: water at 66:1:33 containing 0.04% 
EDTA (hereinafter "butyric-TLC"). No ATP was detected; 
the major product was App(d)Ap with a small amoimt of 
3',5'-ADP present, AppAp and AppdAp prepared in this 
manner were used in ail tie following examples. 

60 

EXAMPLE 2 

One Pot Synthesis of ApApCpApA 

The following solution was placed in a total volume of 40 65 
ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM 
MgCl2, 10 mM DTT, 0.1% TRITON X-100, 1 mM ApApC 
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primer, 5 mM AppAp. The following procedures were 
performed: 

cycle 1 

(a) Add 2 ul (40 units) RNA ligase (phage T4, New England 
Biolabs). Incubate at 37° C. for 15 minutes. Heat at 95° 
C. for 5 minutes, cool to room temperature. 

(b) Add 1 ul (3 units) Alkaline Phosphatase (calf intestine, 
US Biochemicals). Incubate at 37° C. for 30 minutes. 
Heat at 95° C. for 5 minutes, cool to room temperature, 
cycle 2 - starting volume is 20 ul 

(a) Add 10 ul 10 mM AppAp-hl ul RNA ligase. Incubate at 
37° C. for 30 minutes. Heat at 95° C. for 5 minutes, cool 
to room temperature. 

(b) same as cycle 1 

Insoluble coagulated protein-debris was removed by pel- 
leting at 12,000 g for 1 min. The reaction mixture superna- 
tant was analyzed by thin layer chromatography using the 
SureCheck™ Oligonucleotide Kit (US Biochemicals) (here- 
inafter "USB TLC"). The only oligonucleotide product 
visible on the TLC plate was the desired oligonucleotide 
product ApApCpApA; i.e., no n-2, n-1, n-i-1, n+2, etc. 
products were formed. This experiment demonstrates that 
AppA does not participate in the RNA ligase coupling 
reaction, due to its slow coupling rate relative to AppAp. 
This experiment also demonstrates that coupling times with 
efficiencies approaching 100% can be achieved in 15 min- 
utes under these experimental conditions. This is attributable 
to the nucleotide 3',5'-ADP present in the AppAp prepara- 
tion, which prevents covalent inactivation of RNA ligase. 
The final yield of oligonucleotide product approached 
100%. 

EXAMPLE 3 
Synthesis of (ApApC)-pApA 

The following solution was placed in a total volume of 30 
ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM 
MgCl^, 10 mM DTT, 0.1% TRITON X-100, 1 mM ApApC 
primer, 5 mM AppAp. The following procedures were 
performed: 

cycle 1 

(a) Add f ul (20 units) RNA ligase (phage T4, New England 
Biolabs). Incubate at 37° C. for 1 hour. Heat at 95° C. for 

5 minutes, cool to room temperature. 

(b) Add 1 ul (0.03 units) Nucleotide Pyrophosphatase (snake 
venom, Sigma Chemical Co. P7383). Incubate at 37° C. 
for 30 minutes. Heat at 95° C. for 5 minutes, cool to room 
temperature. 

(c) Add 1 ul (3 units) Alkaline Phosphatase (calf intestine, 
US Biochemicals). Incubate at 37° C. for 30 minutes. 
Heat at 95° C. for 5 minutes, cool to room temperature. 

cycle 2 

(a) Add 10 ul 10 mM AppAjH-l ul RNA ligase. Incubate at 
37° C. for 5 hours. Heat at 95° C. for 5 minutes, cool to 
room temperature. 

(b) same as cycle 1 

(c) same as cycle 1 

Insoluble coagulated protein debris was removed by pellet- 
ing at 12,000 g for 5 min. USB TLC revealed pure ApA- 
pCpApA product with no visible n-1 or initial primer 
present. The yield of final product was about 90% of die 
initial primer. 

EXAMPLE 4 
Synthesis of (ADApC)-pADA 

The following solution was placed in a total volume of 30 
ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM 
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MgCl2, 10 mM D1T, 0.1% TRITON X-IOO, 1 mM ApApC 
primer, 5 mM AppAp. The following procedures were 
performed: 
cycle 1 

Performed identically to cycle 1 of example 3. 5 
cycle 2 

(a) Add 1.5 ul 100 mM ATP+3 ul 50 mM 3'5'-ADP+0.1 units 
Inorganic Pyrophosphatase+1 ul RNA ligase. Incubate at 
37° C. for 5 hours. Heat at 95° C. for 5 minutes, cool to 
room temperature. to 

(b) same as cycle 1 of example 3 

(c) same as cycle 1 of example 3 

Insoluble coagulated protein debris was removed by pellet- 
ing at 12,000 g for 5 min. USB TLC revealed nearly pure 
ApApCpApA product with no visible n-1 or initial primer ^5 
present. The yield of final product was about 90% of the 
initial primer. 

EXAMPLE 5 

20 

Synthesis of dApdA 

The oligonucleotide dApdA was synthesized by initially 
synthesizing the oligonucleotide (ApApC)-pdApdA using 
the initial primer ApApC and two coupling cycles with the 25 
activated nucleotide AppdAp. Synthesized oligodeoxyribo- 
nucleotide dApdA was cleaved from the initial primer using 
RNase treatment 

The following solution was placed in a total volume of 30 
ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM 30 
MgCl^, 10 mM DTT, 0.1% TRITON X-100, 1 mM ApApC 
primer, 5 mM AppdAp. The following procedures were 
performed: 

cycle 1 

(a) Add 1 ul (20 units) RNA ligase (phage T4, New England 
Biolabs). Incubate at 37° C. for 3 hours. Heat at 95° C. for 
5 minutes, cool to room temperature. 

(b) Add 1 ul (0.03 units) Nucleotide Pyrophosphatase (snake 
venom, Sigma P7383). Incubate at 37° C. for 30 minutes. 
Heat at 95° C. for 5 minutes, cool to room temperature. 

(c) Add 1 ul (3 units) Alkaline Phosphatase (calf intestine, 
US Biochemicals). Incubate at 37° C. for 30 minutes. 
Heat at 95° C. for 5 minutes, cool to room temperature, 
cycle 2 

(a) Add 1 0 ul 1 0 mM AppdAp+1 ul RNA ligase. Incubate at 
37° C. for 20 hours. Heat at 95° C. for 5 minutes, cool to 
room temperature. 

(b) same as cycle 1 

(c) same as cycle 1 

Insoluble coagulated protein debris was removed by pellet- 
ing at 12,000 g for 5 min. USB TLC revealed pure ApA- 
pCpdApdA product with no visible n-1 or initial primer 
present. The yield of final product was about 90% of the 
initial primer Cleavage of the synthesized oligodeoxyribo- 
nucleotide dApdA from the oligonucleotide product was 
performed by adding 100 ng RNase A (bovine pancreas, US 
Biochemicals) to 4 ul oligonucleotide product and incubat- 
ing at 37° C. for 1 hour. dApdA product was analyzed and 
purified fi-om nucleosides and ApApCp using butyric TLC. 

60 

EXAMPLE 6 

Synthesis of (ApApC)-pApA 

The following solution was placed in a total volume of 30 65 
ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0. 10 mM 
MgQa, 10 mM DTT, 0.1% TRITON X-100, 1 mM ApApC 
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primer, 5 mM AppAp containing 10% glycerol as a preser- 
vative. The solution was overlaid with 50 ul light mineral oil 
to prevent evaporation. The following procedures were 
performed: 
cycle 1 

(a) Add 1 ul (20 units) RNA ligase (phage T4, New England 
Biolabs)+0.5 ul (0.2 units) Inorganic Pyrophosphatase 
(Sigma, yeast)+0.5 ul (0.025 units) 5'-Nucleotidase 
(Sigma, snake venom). Incubate at 37° C. for 1 hour. Heat 
at 95° C. for 5 minutes, cool to room temperature. 

(b) Add 1 ul (0.03 units) Nucleotide Pyrophosphatase 
(Sigma P7383, snake venom). Incubate at 37° C. for 30 
minutes. Heat at 95° C. for 5 minutes, cool to room 
temperature. 

(c) Add 1 ul (3 units) Alkaline Phosphatase (calf intestine, 
US Biochemicals)+0.5 ul (0.05 units) Nucleoside Phos- 
phorylase (Sigma). Incubate at 37° C. for 30 minutes. 
Heat at 95° C. for 5 minutes, cool to room temperature. 

cycle 2 

(a) Add 10 ul 10 mM AppAp+1 ul (20 units) RNA ligase. 
Incubate at 37° C. for 5.5 hours. Heat at 95° C. for 5 
minutes, cool to room temperature. 

(b) same as cycle 1 

(c) same as cycle 1 

Insoluble coagulated protein debris was removed by adding 
5 ug proteinase K (Sigma) and incubating at 60° C. for 5 
minutes. This treatment removed most of the debris. The 
proteinase K was heat inactivated at 95° C. for 5 minutes, 
then cooled to room temperature. Mineral oil was removed 
with a pipettor. Residual mineral oil was removed by adding 
100 ul chloroform, vortexed vigorously, and centrifuged at 
12,000 g for 1 minute to separate the phases. The chloroform 
extraction also removed protein from the aqueous phase, 
which appeared between the two phases. The upper aqueous 
phase was collected by pipettor and was analyzed by USB 
TLC. This revealed pure ApApCpApA product with no 
visible n-1 or initial primer present. The yield of final 
product was about 90% of the initial primer. 
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Synthesis of (ApApC)-pApA 

The following solution was placed in a total volume of 30 
ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM 
MgCl^, 10 mM DTT, 0.1% TRITON X-100, 1 mM ApApC 
primer, 5 mM AppAp. The following procedures were 
performed: 

cycle I. 

(a) Add 1 ul (20 units) RNA ligase (phage T4, New England 
Biolabs). Incubate at 37° C. for 1 hour. Add 1 ug Pro- 
teinase K (Sigma), incubate at 60° C. for 5 minutes, heat 
at 95° C. for 5 minutes to inactivate protease, and cool to 
room temperature. 

(b) Add 1 ul (0.03 units) Nucleotide Pyrophosphatase (snake 
venom, Sigma P7383). Incubate at 37° C. for 30 minutes. 
Add 1 ug Proteinase K (Sigma), incubate at 60° C. for 5 
minutes, heat at 95° C. for 5 minutes to inactivate 
protease, and cool to room temperature. 

(c) Add 1 ul (3 units) Alkaline Phosphatase (calf intestine, 
US Biochemicals). Incubate at 37° C. for 30 minutes. Add 
1 ug Proteinase K (Sigma), incubate at 60° C. for 5 
minutes, heat at 95° C. for 5 minutes to inactivate 
protease, and cool to room temperature. 

cycle 2 
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(a) Add 10 ul 10 mM AppAp+1 ul (20 units) RNA ligase. 
Incubate at 37° C. for 5.5 hours. Add 1 ug Proteinase K 
(Sigma), incubate at 60° C. for 5 minutes, heat at 95° C. 
for 5 minutes to inactivate protease, and cool to room 
temperature, (b) same as cycle 1 5 

(c) Add 1 ul (3 units) Alkaline Phosphatase (calf intestine, 
US Biochemicals). Incubate at 37° C. for 30 minutes. The 
use of Proteinase K for the inactivation of enzymes after 
each step prevented the accumulation of insoluble coagu- 
lated protein debris. USB TLC revealed pure ApApC- 10 
pApA product with no visible n-1 or initial primer 
present. The yield of final product was about 90% of the 
initial primer. 

EXAMPLE 8 15 

Synthesis of (ApApC)-pApA 

The following solution was placed in a total volume of 30 
ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM 
MgCl2, 1 0 mM DTT, 0. 1 % TRITON X- 100, 1 mM ApApC 
primer, 5 mM AppAp, containing 10% dimethylsulfoxide to 
inhibit base pairing. The synthesis procedure was identical 
to Example 3. USB TLC revealed pure- ApApCpApA prod- 
uct with no visible n-1 or initial primer present. The yield ^5 
of final product was about 90% of the initial primer. 

EXAMPLE 9 



Synthesis of (ApApC)-pApA 



30 



The following solution was placed in a total volume of 30 

ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM 
MgCl2, 10 mM DTT, 0,1% TRITON X-100, 1 mM ApApC 
primer, 5 mM AppAp, 10 uM Vanadyl Ribonucleoside 
Complexes (to inhibit any contaminating RNases). The ^5 
synthesis procedure was identical to Example 3. USB TLC 
revealed pure ApApCpApA product with no visible n-1 or 
initial primer present. The yield of final product was about 
90% of the initial primer. 

40 

EXAMPLE 10 

Synthesis of (ApApC)-pApA 

The following solution was placed in a total volume of 30 45 
ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM 
MgCl2, 10 mM DTT, 0.1% TRITON X-100, 1 mM ApApC 
primer, and 5 mM AppAp. The following procedures were 

performed: 

cycle 1 50 

(a) Add 1 ul (20 units) RNA ligase (phage T4, New England 
Biolabs)+l ul 3 mM sodium pyrophosphate-i-1 ul 300 mM 
glucose+0.2 units hexokinase (yeast, Sigma). Incubate at 
37° C. for 1 hour. Heat at 95° C. for 5 minutes, cool to 
room temperature. 55 

(b) Add 1 ul (0.03 units) Nucleotide Pyrophosphatase (snake 
venom, Sigma P7383). Incubate at "iT C. for 30 minutes. 
Heat at 95° C. for 5 minutes, cool to room temperature. 

(c) Add 1 ul (3 units) Alkaline Phosphatase (calf intestine, 
US Biochemicals). Incubate at 37° C. for 30 minutes. 60 
Heat at 95° C. for 5 minutes, cool to room temperature, 
cycle 2 

(a) Add 10 ul 10 mM AppAp+1 ul (20 units) RNA ligase+1 
ul 3 mM sodium pyrophosphate+1 ul 300 mM glucose+ 
0.2 units hexokinase. Incubate at 37° C. for 3.5 hours. 65 
Heat at 95° C. for 5 minutes, cool to room temperature. 

(b) same as cycle 1 



(c) same as cycle 1 

Insoluble coagulated protein debris was removed by pellet- 
ing at 12,000 g for 5 min. USB TLC revealed nearly pure 
ApApCpApA product with slight n-1 side product. 

EXAMPLE 11 

One Pot Synthesis of ApApC-pApA with TAP 

The following solution was placed in a total volume of 30 
ul in an ependorf tube: 50 mM BES, pH 7.0, 10 mM MgClz, 
10 mM DTT, 0.1% TRITON X-100, 1 mM ApApC primer, 
5 mM AppAp. The following procedures were performed: 

cycle 1 

(a) Add 2 ul (40 units) RNA Ligase (phage T4, New England 
Biolabs). Incubate at 37° C. for 2 hours. Heat at 95° C. for 
5 minutes, cool to room temperature. 

(b) Add 1 ul (3 units) Alkaline Phosphatase (calf intestine, 
US Biochemicals)+l ul (2 units) Tobacco Acid Pyrophos- 
phatase (Sigma). Incubate at 37° C. for 3.5 hours. Heat at 
95° C. for 5 minutes, cool to room temperature. 

cycle 2 

(a) Add 10 ul 10 mM AppAp+1 ul RNA Ligase. Incubate at 
37° C. for 30 minutes. Heat at 95° C. for 5 minutes, cool 
to room temperature. 

(b) same as cycle 1 

Insoluble coagulated protein debris was removed by pellet- 
ing at 12,000 g for 1 min. The only oligonucleotide product 
visible by USB TLC was the desired oligonucleotide prod- 
uct ApApCpApA. The final yield of oligonucleotide product 
was nearly 100%. 

EXAMPLE 12 

Synthesis of ApApC-pdApdA Using TdT-+ddATP 
Capping 

The following solution was placed in a total volume of 30 
ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM 
MgCl2, 10 mM DTT, 0.1% TRITON X-100, 1 mM ApApC 
primer, 5 mM AppdAp. The following procedures were 
performed: 

cycle 1 

(a) Add 1 ul (20 units) RNA Ligase (phage T4, New England 
Biolabs). Incubate at 37° C. for 3 hours. Heat at 95° C. for 
5 minutes, cool to room temperature, 

(b) Add 1 ul Terminal deoxynucleotidyl Transferase (USB, 
17 units/ul)+3 ul 5 mM dideoxyadenosine 5'-triphosphate. 
Incubate at 37° C. for 2.5 hours. Add 1 ug Proteinase K, 
incubate at 60° C. for 15 minutes, heat at 95° C. for 5 min, 
cool to room temperature. 

(c) Add 1 ul (0.03 units) Nucleotide Pyrophosphatase (snake 
venom, Sigma P7383). Incubate at 37° C. for 30 minutes. 
Heat at 95° C. for 5 minutes, cool to room temperature. 

(d) Add 1 ul (3 units) Alkaline Phosphatase (calf intestine, 
US Biochemicals). Incubate at 37° C. for 30 minutes. 
Heat at 95° C. for 5 minutes, cool to room temperature, 
cycle 2 

(a) Add 10 ul 10 mM AppdAp+1 ul RNA Ligase. Incubate 
at 37° C. for 15 hours. Heat at 95° C. for 5 minutes, cool 
to room temperature. 

(b) same as cycle 1 

(c) same as cycle 1 

(d) same as cycle 1 

Insoluble coagulated protein debris was removed by pellet- 
ing at 12,000 g for 5 min. The reaction mixture supernatant 
was analyzed by USB TLC. The only oligonucleotide prod- 
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uct visible was the desired product ApApCpdApdA. The 
yield of final product was about 90% of the initial primer, 

EXAMPLE 13 

5 

Synthesis of ApApC-DApA Using Enzyme- Solid 
Support Matrix 

The following solution was placed in a total volume of 30 
ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM lo 
MgCl2, 10 mM DTT, 0.1% TRITON X-100, 1 mM ApApC 
primer, 5 mM AppAp. The following procedures were 
performed: 

cycle 1 

(a) Add 1 ul (20 units) RNA Ligase (phage T4, New England 15 
Biolabs). Incubate at 37° C. for 1 hour. Heat at 95° C. for 

5 minutes, cool to room temperature. 

(b) Add 1 ul (0.03 units) Nucleotide Pyrophosphatase (snake 
venom, Sigma P7383). Incubate at 37° C. for 30 minutes. 
Heat at 95° C. for 5 minutes, cool to room temperature, 

(c) Add 6 ul Alkaline Phosphatase- A cry lie Beads (calf 
intestine, Sigma Chemical Co.). Incubate at 37° C. for 2.5 
hours with occasional mixing. Remove CIAP-acrylic 
beads by briefly pelleting. Heat supernatant at 95'' C. for 

5 minutes to remove any residual CIAP leakage, cool to ^5 
room temperature, 
cycle 2 

(a) Add 10 ullO mM AppAp+1 ul RNA Ligase. Incubate at 
37° C. for 2 hours. Heat at 95° C. for 5 minutes, cool to 

room temperature. 30 

(b) same as cycle 1 

(c) same as cycle 1, except skip the heat inactivation 
Insoluble coagulated protein debris was removed by pellet- 
ing at 12,000 g for 5 min. USB TLC revealed a mixture of 
approximately 50% ApApCpA and 50% ApApCpApA oli- 35 
gonucteotide product. The n-1 failure sequence was due to 
the incomplete 3'-dephosphorylation of the oligonucleotide 

in the first cycle. This example demonstrates that the 
enzymes can be covalenfly attached to a solid matrix. 

40 

EXAMPLE 14 

The method of the invention can be used for synthesizing 
oligonucleotide mixtures in which two or more diflFerent 
bases are used at a particular position. This technique is 45 
known in the art as "wobbling" and is useful in hybridization 
applications of an oligonucleotide to a DNA library based on 
amino acid sequence. Wobbling is performed by adding a 
mixture of blocked nucleotide substrates instead of a single 
blocked nucleotide substrate during the RNA ligase step of 50 
one cycle. The relative amounts of the blocked nucleotides 
used is selected to balance out differences in coupling rate. 
For example, if a 50:50 mix of A and G is desired, a mixture 
of the nucleotide substrates AppAp and AppGp would be 
added during the RNA ligase step of the appropriate reaction 55 
cycle. If the reactivities of AppAp and AppGp are equal, the 
substrates would be used in equal amounts. 

EXAMPLE 15 

60 

Synthesis of (ApApC)-pApA 

The same protocol was used from example 3, except that 
after RNA Ligase coupling, the heat inactivation step in part 
(a) of each cycle was omitted, and 20 units additional RNA 65 
Ligase was added during each Alkaline Phosphatase diges- 
tion. USB TLC revealed pure ApApCpApA product with no 
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visible n-1 or initial primer present. The yield of final 
product was about 90% of the initial primer. 

This example demonstrates that inactivation of the chain 
extending enzyme is not necessary. In addition, the use of a 
thermostable chain extending enzyme would obviate the 
need to add this enzyme after each cycle. This example also 
demonstrates that phosphodiesterase I incubation can be 
performed without prior inactivation of the chain extending 
enzyme. Optionally, phosphodiesterase I incubation can be 
performed in the presence of 5 '-Nucleotidase to hydrolyze 
AMF generated by phosphodiesterase I cleavage of App- 
(d)Np. 

EXAMPLE 16 

Synthesis of ApApCpApA with Substrate Reuse 

The oligonucleotide ApApCpApA was synthesized 
according to the following procedure. The following solu- 
tion was placed in a total volume of 30 ul in an ependorf 
tube: 50 mM Tris-Cl, pH 8.0, 10 mM MgCl2, 10 mM DTT, 
0.1% TRITON X-100, 1 mM ApApC initial primer, and 
Nucleotide Substrate. The following procedure was per- 
formed: 

cycle 1 

(a) Add 1 ul (20 units) T4 RNA Ugase (New England 
Biolabs), incubate at 37 degrees C. for 3 hours, heat at 85 
degrees C, for minutes, cool. 

(b) Add 1 ul (3 units) T4 Polynucleotide Kinase (US 
Biochemicals, contains 3 "-Phosphatase), incubate at 37 
degrees C. for 1 hour, heat at 85 degrees C for 5 minutes, 
cool. 

cycle 2 - starting volume is 20 ul 

(a) same as cycle 1. No AppAp substrate was added. 

(b) same as cycle 1. 

Sub-Example A: Nucleotide substrate was approximately 5 
mM AppAp. 

Sub-Example B: Nucleotide substrate was 5 mM 3',5'ADP-i- 
4.5 mM ATP. These precursors are converted to AppAp in 
tiie first cycle by RNA Ligase. Supplementation with 
inorganic pyrophosphatase in a separate experiment 
improved oligonucleotide product yield. 

USB TLC confirmed the formation of ApApCpApA product 
for both sub-examples. USB TLC also confirmed that no 
significant inactivated nucleotide substrate AppA was 
formed for both sub-examples. Approximately 5 ul oli- 
gonucleotide product was incubated with 100 ng RNase A 
(US Biochemicals) at 37° C. for about 15 minutes. RNase 
A is used as a base-specific RNase to cleave the oligo- 
nucleotide 3' to the Cytidinc base. Butyric TLC confirmed 
the formation of ApA oligonucleotide product for both 
sub-examples. Yield of oligonucleotide product was bet- 
ter in sub-example A, 

This experiment demonstrates reuse in the Second cycle 
of nucleotide substrate AppAp used in the first cycle. This 
was accomplished by using bacteriophage T4 3'-Phos- 
phatase under carefully controlled conditions to specifically 
remove the extended primer blocking group without signifi- 
cantly inactivating the nucleotide substrate AppAp. The high 
concentration of primer and nucleotide substrate used in tiiis 
example and the following examples is for the convenience 
of allowing detection of product by thin layer chromatog- 
raphy. Proportionately lower concentrations, such as 0.10 
mM primer and 1.0 inM nucleotide substrate may be more 
appropriate for long oligonucleotides to lessen the build up 
of side products. 
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EXAMPLE 17 

Synthesis of ApApCpApA using Rye Grass 
3'-Phosphatase 

5 

ApApCpApA was synthesized using the same procedure 
as Example 16, sub-example A, except 0.05 units 3'-Phos- 
phatase from Rye Grass (Sigma, sold as 3 -Nucleotidase) 
was used for 3 hours at 37 degrees C. in place of T4 
Polynucleotide Kinase (3 '-Phosphatase). Butyric TLC con- lo 
finned synthesis of product and RNase A digestion con- 
firmed formation of ApA. 
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EXAMPLE 19 

Synthesis of ApApCpApA using Substrate Reuse, 
and AMP Inactivating Enzyme(s) 



15 



EXAMPLE 18 

Synthesis of ApApCpApA using Preferred Mode 
with Substrate Reuse 



The following solution was placed in a total volume of 30 
ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM 20 
MgCl2, 10 mM DTT, 0.1% Triton X-100, 1 mM ApApC 
initial primer, and 5 mM AppAp. The following procedure 

was performed: 
cycle 1 

(a) Add 1 ul (20 units) T4 RNA Ligase (New England 25 
Biolabs)+0.5 ul (0.025 units) 5'-Nucleotidase (Sigma), 
incubate at 37 degrees C. for 1 hour, heat at 85 degrees C. 
for 5 minutes, cool. 

(b) Add Exonuclease— see details below. Heat at 95° C. 
for 5 minutes, cool. 30 
(c) Add 0.5 ul (15 units) T4 Polynucleotide Kinase (US 

Biochemicals), incubate at 37 degrees C for 30 minutes, 

heat at 85 degrees C. for 5 minutes, cool. 

cycle 2 — starting volume is 20 ul 

(a) same as cycle 1, but incubation is extended to 135 
minutes. No AppAp substrate was added. 

(b) same as cycle 1. 

(c) same as cycle 1. 

Sub-Example A: Exonuclease added was 1 ul (0.02 units) 
Phosphodiesterase 1 (US Biochemicals). In this sub-ex- 
ample only, 1 ul 100 mM ATP is added during RNA 
Ligase incubation in the second cycle to reform the 
substrate AppAp from 3 ',5'- ADR 

Sub-Example B: Exonuclease added was 1 ul (10 units) 45 
Exonuclease I (US Biochemicals) 

Sub-Example C: Exonuclease added was 1 ul (0.1 units) 
Polynucleotide Phosphorylase (Sigma). In this sub-ex- 
ample only, 0.2 mM Na2As04 was incorporated in the 
buflFer through-out the synthesis to facilitate Polynucle- 
otide Phosphorylase digestion of unextended primer 
chains. 

USB TLC confirmed the formation of ApApCpApA product 
in all sub-examples. Digestion with RNase A confirmed 
the formation of ApA in all sub-examples. 
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The following solution was placed in a total volume of 30 
ul in an ependorf tube: 50 mM Tris-Cl, pH 8.0, 10 mM 
MgCl2, 10 mM DTT, 0.1% Triton X-100, 1 mM ApApC 
initial primer, and 5 mM AppAp. The following procedure 55 
was performed: 

cycle 1 



(a) Add 1 ul (20 units) T4 RNA Ligase (New England 
Biolabs)+AMP Inactivating Enzyme(s), incubate at 37° 
C. for 3 hours, heat at 85° C. for 5 minutes, cool. 

(b) Add 1 ul (3 units) T4 Polynucleotide Kinase (US 
Biochemical), incubate at 37° C. for 1 hour, heat at 85° C. 
for 5 minutes, cool. 

cycle 2 

(a) same as cycle 1. No AppAp substrate is added. 

(b) same as cycle I. 

Sub-Example A: AMP Inactivating Enzyme was 0.5 ul 
(0.025 units) 5'-Nucleotidase (Sigma) 

Sub-Example B: AMP Inactivating Enzyme was 0.5 ul 
(0.025 units) 5' -Nucleotidase (Sigma>f 1 ul (0.018 units) 
Adenosine Deaminase (Sigma). 

Sub-Example C: AMP Inactivating Enzyme was 1 ul (0.004 
units) AMP Deammase (Sigma). 

Sub-Example D: AMP Inactivating Enzyme was 1 ul (0.12 
units) AMP Nucleosidase (E. coli). 

USB TLC confirmed the formation of ApApCpApA product 
in all sub-examples. USB TLC also confirmed that the 
AMP Inactivating Enzymes in all sub-examples converted 
substantially all substrate to product. In all sub-examples, 
butyric TLC confirmed that the oligonucleotide ApA was 
cleaved from the product by RNase A digestion. It was 
also found that adenosine deaminase was not inactivated 
at 95° C, a useful property. 

EXAMPLE 20 

.Synthesis of ApApCpApApdA using cycles with 
and without Substrate Reuse 

The following solution was placed in a total volume of 30 
ul in an ependorf tube: 50 mM Tris-CI, pH 8.0, 10 mM 
MgQz, 10 mM DTT, 0.1% Triton X-100, 1 mM ApApC 
initial primer, and 5 mM AppAp. The following procedure 
was performed: 

cycle 1: Reuse 

(a) add 1 ul (20 units) T4 RNA Ligase (New England 
Biolabs), incubate at 37 degrees C. for 1 hour, heat at 85° 
C. for 5 minutes, cool. 

(b) add 1 ul (3 units) T4 Polynucleotide Kinase (US Bio- 
chemicals), incubate at 37 degrees C. for 1 hour, heat at 
85° C. for 5 minutes, cool. 

cycle 2: No Reuse 

(a) add 1 ul (20 units) T4 RNA Ligase (New England 
Biolabs), uicubate at 37 degrees C. for 1 hour, heat at 85° 
C. for 5 minutes, cool. 

(b) add 1 ul (0.035 units) Nucleotide Pyrophosphatase 
(Sigma, snake venom), incubate at 37° C. for 30 minutes, 
heat at 95° C. for 5 minutes, cool. 

(c) add 1 ul (1.6 units) Alkaline Phosphatase (US Biochemi- 
cals, calf intestine), incubate at 45° C. for 30 minutes, heat 
at 95° C. for 5 minutes, cool. (Alkaline Phosphatases 
generally have better activity at higher temperatures, such 
as 45°-60° C). 

cycle 3: No Reuse 

(a) add 2 ul (40 units) T4 RNA Ligase O^ew England 
Biolabs)-l-10 ul 10 mM AppdAp, incubate at 37° C. for 80 
minutes, heat at 85° C. for 5 minutes, cool. 

(b) same as cycle: 2. 

(c) same as cycle 2. 

USB TLC strongly suggested formation of ApApCpApA- 
pdA product. Incubation of 5 ul oligonucleotide product 
with 100 ng RNase A (US Biochemicals) at 37° C. for 15 
minutes resulted in the cleavage of the oligonucleotide to 
ApApdA product as strongly suggested by USB and 
buytric. Matrix assisted laser desorption mass spectros- 
copy confirms formation of this product. 
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I claim: 

1. A method for synthesizing an oligonucleotide of a 
defined sequence, comprising the steps of: 

(a) combining (1) an oligonucleotide primer and (2) a 
blocked nucleotide or a blocked nucleotide precursor 
that forms a blocked nucleotide in situ, in a reaction 
mixture in the presence of a chain extending enzyme 
eflFective to couple the blocked nucleotide to the 3'-end 
of the oligonucleotide primer such that a primer- 
blocked nucleotide product is formed, wherein the 
blocked nucleotide comprises a nucleotide to be added 
to form part of the defined sequence and a blocking 
group attached to the 3'-cnd of the nucleotide effective 
to prevent the addition of more than one blocked 
nucleotide to the primer; 

(b) removing the blocking group from the 3 '-end of the 
primer-blocked nucleotide product to form a primer- 
nucleotide product, whereby the reaction mixture con- 
tains any unreacted starting materials that may remain, 
primer-nucleotide product and reaction by-products; 
and 

(c) repeating at least one cycle of steps (a) and (b) using 
the primer-nucleotide product from step (b) as the 
oligonucleotide primer of step (a) in the subsequent 
cycle without separation of the primer-nucleotide prod- 
uct from the remainder of the reaction mixture. 

2. A method according to claim 1, wherein the blocking 
group is removed enzymaticaliy. 

3. A method according to claim 2, wherein each cycle 
further comprises the additional step of inactivating unre- 
acted blocked nucleotide in the reaction mixture to render it 
less reactive as a substrate for chain extending enzyme. 

4. A method according to claim 3, wherein the chain 
extending enzyme is RNA ligase. 

5. A method according to claim 4, wherein the blocked 
nucleotide is App(d)Np, where N represents any nucleoside 
or nucleoside analog which RNA ligase can couple to an 
oligonucleotide primer. 

6. A method according to claim 5, wherein the blocking 
group is a phosphate and is removed from the primer- 
blocked nucleotide product by a phosphatase. 

7. A method according to claim 5, wherein unreacted 
blocked nucleotide is inactivated by a phosphatase enzyme. 

8. A method according to claim 5, wherein the unreacted 
blocked nucleotide is inactivated by a Dinucleotide Pyro- 
phosphate Degrading Enzyme. 

9. A method according to claim 1, wherein each cycle of 
steps further comprises the additional step of modifying 
uncoupled oligonucleotide primer to prevent its coupling to 
blocked nucleotide in subsequent cycles of the method. 

10. A method according to claim 9, wherein the uncoupled 
oligonucleotide primer is modified by incubating with at 
least one Exonuclease, whereby the uncoupled oligonucle- 
otide primer is degraded. 

11. A method according to claim 9, wherein the uncoupled 
oligonucleotide primer is modified by incubating with a 
chain terminating nucleotide and an enzyme effective to 
couple the chain terminating nucleotide to uncoupled oli- 
gonucleotide primer, whereby uncoupled oligonucleotide 
primer is terminated. 

12. A method according to claim 2, wherein the defined 
sequence includes at least one repeat region which is syn- 
thesized by a method comprising the steps of: 

(1) extending the oligonucleotide primer with 3'-phos- 
phate-blocked nucleotide to form 3 '-phosphate-blocked 
primer-nucleodde; 

(2) removing the 3 '-phosphate blocking group from the 
3'-phosphate-blocked primer-nucleotide substantially 
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without removing the 3 -phosphate blocking group 
from imreacted 3 '-phosphate-blocked nucleotide using 
3 Phosphatase; and 
(3) repeating steps (1) and (2) using imreacted 3'-phos- 
phate-blocked nucleotide from step (2) as the 3'-phos- 
phate-blocked nucleotide of step (1). 

13. A method for synthesizing a repeat region of an 
oligonucleotide having a defined sequence, said repeat 
region including a repeated nucleotide that appears more 
than once in succession, comprising the steps of: 

(a) enzymaticaliy coupling an oUgonucleotide primer 
with a 3'-phosphate-blocked repeated nucleotide to 
form a 3'-phosphate blocked primer-nucleotide; 

(b) removing the 3 '-phosphate blocking group from the 
3 -phosphate-blocked primer-nucleotide using 3'-phos- 
phatase enzyme substantially without removing the 
3 -phosphate blocking group from unreacted 3'-phos- 
phate-blocked repeated nucleotide; and 

(c) repeating steps (a) and (b) using the unreacted 3 '-phos- 
phate-blocked repeated nucleotide from step (b) as the 
3 '-phosphate-blocked nucleotide of step (a) and using 
the deblocked primer-nucleotide product of step (b) as 
the oligonucleotide primer of step (a), 

14. A method for synthesizing an oligonucleotide, 
wherein the 3 '-end of an ohgonucleotide primer is coupled 
with a blocked nucleotide to form a primer-blocked nucle- 
otide product in a reaction mixture, said blocked nucleotide 
comprising a nucleotide to be added to the oligonucleotide 
primer and a blocking group attached to the 3 '-end of the 
nucleotide effective to prevent the addition of more than one 
blocked nucleotide to the oligonucleotide primer, compris- 
ing incubating the reaction mixture with an exonuclease, 
whereby any oligonucleotide primer which was not coupled 
is degraded, substantially without degrading the primer- 
blocked nucleotide product. 

15. A method for synthesizing an oligonucleotide, 
wherein the 3 '-end of an oligonucleotide primer is enzymati- 
caliy coupled with a blocked nucleotide to form a primer- 
blocked nucleotide product in a reaction mixture, said 
blocked nucleotide comprising a nucleotide to be added to 
the oligonucleotide primer and a removable blocking group 
attached to the 3'-end of the nucleotide effective to prevent 
the addidon of more than one blocked nucleotide to the 
oligonucleotide primer, comprising incubating the reaction 
mixture with a chain terminating nucleotide and an enzyme 
effective to couple the chain terminating nucleotide to the 
oligonucleotide primer, whereby oligonucleotide primer 
which was not coupled to a blocked nucleotide is end- 
capped to render it unreactive to further coupling, said chain 
terminating nucleotide being different from said blocked 
nucleotide and selected such that end-capped oligonucle- 
otide primer remains end-capped and unreactive when the 
blocking group is removed from the primer-blocked nucle- 
otide product. 

16. A method according to claim 15, wherein the chain 
terminating nucleotide is a dideoxynucleotide. 

17. A method according to claim 14, wherein at least two 
nucleotides are added to the primer without intermediate 
purification of the resulting ohgonucleotide product from 
other reactants and reaction by-products. 

18. A method according to claim 1, wherein each cycle 
further comprises the additional step of converting adenos- 
ine monophosphate released in the coupling reaction to a 
less reacdve form, whereby any inhibitory effect of the 
adenosine monophosphate pm the coupling of the oligo- 
nucleotide primer to the blocked nucleotide is minimized. 

19. A method according to claim 1, further comprising the 
step of cleaving a synthesized oligonucleotide to remove 
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some or all of the oligonucleotide primer used in the first 
cycle of the method from the synthesized oligonucleotide. 

20. A method for coupling a blocked nucleotide AppNp to 
an oligonucleotide primer, characterized in that the blocked 
nucleotide is coupled to the primer using RNA Ligase in the 5 
absence of ATP, and in that pyrophosphate or unactivated 
nucleotide substrate, 3,5'-NDP, is used to regenerate free 
RNA Ligase from the inactivated adenylylated form, 
wherein N represents any nucleoside or nucleoside analog 
which RNA hgase can couple to an oligonucleotide primer, lo 

21. A method for coupling a blocked nucleotide to an 
oligonucleotide primer, characterized in that the coupling is 
performed using RNA Ligase in the presence of 5'-Nucle- 
otidase, AMP Nucleotidase or AMP Deaminase, whereby 
AMP released in the coupHng reaction is converted to a form 15 
which is less effective than AMP to inhibit the coupling 
reaction or participate as a substrate in a reverse coupling 
reaction. 

22. A method for synthesizing a selected oligonucleotide 
wherein an oligonucleotide primer is extended by enzymati- 20 
cally adding at least two nucleotides to die 3'-cnd of the 
oligonucleotide primer, characterized in that the primer is 
cleaved from the added nucleotides to form the selected 
oligonucleotide. 

23. A method for converting a blocked nucleotide com- 25 
prising a dinucleotide pyrophosphate moiety, a blocking 



34 



group effective to prevent the enzymatic coupling of more 
than one blocked nucleotide to an oligonucleotide primer, 
and a nucleotide to be enzymatically coupled to the primer 
to a less reactive form, characterized in that the blocked 
nucleotide is treated with a dinucleotide pyrophosphate 
degrading enzyme. 

24. A method according to claim 8, wherein the Dinucle- 
otide Pyrophosphate Degrading Enzyme is Nucleotide Pyro- 



25. A method according to claim 23, wherein the Dinucle- 
otide Pyrophosphate Degrading Enzyme is Nucleotide Pyro- 
phosphatase. 

26. A method according to claim 10, wherein the Exonu- 
clease is exonuclease I, phosphodiesterase I, or polynucle- 
otide kinase. 

27. A method according to claim 14, wherein the Exonu- 
clease is exonuclease L phosphodiesterase I, or polynucle- 
otide kinase. 

28. A method according to claim 18, wherein adenosine 
monophosphate is converted to a less reactive form using 
5'-Nucleotidase. 

29. A method according to claim 18, wherein adenosine 
monophosphate is converted to a less reactive form using 
AMP Nucleosidase. 
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